Rethinking deep vision interpretability

For his PhD award talk at Gretsi, Thomas Fel gave a talk on explainability of deep vision models. Among the various approaches, attribution methods (Grad CAM) produce saliency maps indicating which part of the input is used by the neural network to build up its response.

However, these attribution methods are restricted to inform about the where and not the what. It does not tell you about the nature of the pattern the neural network has extracted. In this work, T. Fel developes techniques to extract concepts from deep neural networks. Concepts are extracted by non-negative matrix factorization techniques (Gillis, 2024). A survey on concept based explainable AI can be found in (Poeta et al., 2023). In deep neural networks, there is nothing like the grand mother cell, a cell that would alone encode a feature; At least, this is not always true. The concepts are encoded by distributed patterns of activations (Elhage et al., 2022).

In his talk, he also mentioned the Linear Representation Hypothesis where features could be encoded as directions in the latent space of neural networks. However, as he mentioned, in practice, this does not appear to be the case because of the steering problem : pushing a representation toward such vectors actually steers the output along one feature but at some point fails.

In (Fel et al., 2023), he introduces a unifying framework of concept extraction techniques (e.g. K-means, PCA, or NMF): they basically all fall into the category of dictionary learning.

The Lens project provides illustrations of visual concepts extracted from a large vision model trained on ImageNet. A cup of coffee hence appears to activate the concepts of latte art, handle, and black coffee. It helps also explaining failure cases. One example he showed was the “Man on the moon” picture that a neural network could classify as a shovel. Using visual concepts, it could be identified that this picture excites the concepts of trouser and rubble, that you actually see on the picture. However, in ImageNet, these two concepts are often strongly correlated with people using a shovel for snow.

Finally, I would like to mention the explainability toolbox Xplique which implements several explainability methods, in particular concept based approaches previously mentioned.

References

  1. Non negative matrix factorization
    Nicolas Gillis
    Jul 2024
  2. Concept-based Explainable Artificial Intelligence: A Survey
    Eleonora Poeta, Gabriele Ciravegna, Eliana Pastor, and 2 more authors
    Jul 2023
  3. Toy Models of Superposition
    Nelson Elhage, Tristan Hume, Catherine Olsson, and 13 more authors
    Transformer Circuits Thread, Jul 2022
    https://transformer-circuits.pub/2022/toy_model/index.html
  4. A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation
    Thomas Fel, Victor Boutin, Mazda Moayeri, and 5 more authors
    Jul 2023



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Customizing neovim
  • Adb commands over wifi
  • Atomic auto-encoders for learning sparse representations
  • Job token permissions to access registries
  • Regularized gradient descent