Using Attribution to Understand Deep Neural Networks
There was a neural model for predicting cancer from XRays. It had good accuracy on held out training data. But when we attributed its predictions back to the pixels of the XRays, we found that the network relied on barely visible pen marks that the doctors had made on the training data, and not the pathology of cancer. Naturally, the model was not deployed!
I work on techniques to perform prediction attribution of this kind. The target of the attribution can be input features (pixels in the example above), or interactions between its input features, or neurons. or training data examples. Attributions are reductive; i.e, they abstract away most of the interactions and a lot of the non-linearity of neural networks. However, attributions, done systematically, are effective at uncovering bugs as in the anecdote above.
We will briefly discuss the theory (e.g connections to the Taylor series, Shapley values, and Stochastic Gradient Descent) and philosophy of attribution, and other amusing examples of bugs.
If you are a deep learning practitioner, you can easily apply attribution to your own models; all the techniques can be implemented with less than ten lines of code.
I am a principal research scientist/director at Google. These days, I analyze complex machine learning models. I have also worked on question-answering systems, ad auctions, security protocol analysis, privacy, and computational biology.
There once was a RS called MS,
He studies models that are a mess,
A director at Google.
Accurate and frugal,
Explanations are what he likes best.