What are Integrated Gradients?

Integrated Gradients

Integrated gradients are a technique for attributing the predictions of a classification model to input features. It can be used to visualize the relationship between input features and model predictions. It is a local method that helps account for each individual prediction. For example, in the Fashion MNIST dataset, if we take the image of a shoe, then the positive attributions are the pixels of the image which make a positive influence on the model classifying the image as a shoe. The integrated gradient method is mainly used to identify errors in the model, where corrections can be made to improve the accuracy of the model.

per-prediction-level-explainability-local

Sign up for our monthly newsletter, The Drift.

Subscribe