Question on kernel.py of SHAP package #165

roye10 · 2020-09-23T10:51:24Z

First of all, the package is great! I only have a few technical questions with regards to the kernel.py code of the SHAP package

I am trying to understand how SHAP values are computed by the Kernel SHAP method from the SHAP package. As I understand it, you solve a weighted linear regression to attain SHAP values, using phi = ((X'WX)^-1)X'Wy.

At a high level, wha I gather so far - especially looking at lines 296-343, in combination with the referenced functions:

nsamples determines how many permutations out of the M! possible feature orderings are regarded
(nsamples X number of background samples (size N))-number of y-values are computed, whereby each N-sized set is the result of a different feature ordering (i.e. different features "switched on/off") for each row of background samples (size N) and the empirical mean is taken for each of the N-sized sets, producing nsamples empirical means
a diff-in-diff is calculated, to match feature contributions to individual features ( E[y|X_s] - f_null ) - maskMatrix*( f_x - f_null)
then a WLS is solved, providing the SHAP values

Did I understand this correctly?

If so, there are a few things, however, I would like to ask:

if the feature space is fairly large (say 100+), since nsamples restricts variety of feature permutations computed and since the feature permutations are regarded in a sequential manner, (i.e. first only subsets of size 1, 2, ...), does this not disregard interaction effects between features? Even though this is partially accounted for as I understand by

"add random samples from what is left of the subset space"

Because even for 47 features onwards, if I understand it correctly, only subsets of size 1 (and its complement) are considered fully.

why is one variable eliminated later on in the solve function?

"eliminate one variable with the constraint that all features sum to the output"

what is the intuition behind multiplying the weight vector by 2?

Thank you very much in advance!

interpret-ml · 2020-10-02T22:19:58Z

Hi @roye10,

Sorry for the delay in getting back to you! @slundberg would be the best person to answer this question. We see that you opened an issue on the SHAP repo as well (slundberg/shap#1488), which might be a better place to track a response. Happy to leave this issue open in the meanwhile.

-InterpretML Team

roye10 · 2020-10-02T23:09:57Z

Hi InterpretML Team,

no worries, thank you for your reply. That is appreciated, thanks! I'll post an update if I get a response or found a response in anyway.

-roye10

roye10 · 2020-10-07T11:17:16Z

Think I found an answer to:

what is the intuition behind multiplying the weight vector by 2?

weight vector multiplied by 2, in order to regain weight for one subsample (before weight vector multiplied by two, to account for the complement subset and such that weight vector (including complement subset) sum to 1)

rodrigovssp · 2020-10-09T02:50:03Z

Em parâmetros de desenvolvimento

roye10 changed the title ~~Question on loss function used in SHAP~~ Question on addsample function in kernel.py of SHAP package Sep 25, 2020

roye10 changed the title ~~Question on addsample function in kernel.py of SHAP package~~ Question on kernel.py of SHAP package Sep 28, 2020

interpret-ml closed this Oct 21, 2020

Oct	NOV	Dec
	09
2019	2020	2021

interpretml / interpret

Question on kernel.py of SHAP package #165

Question on kernel.py of SHAP package #165

roye10 commented Sep 23, 2020 •

edited

interpret-ml commented Oct 2, 2020

roye10 commented Oct 2, 2020 •

edited

roye10 commented Oct 7, 2020

rodrigovssp commented Oct 9, 2020

interpretml / interpret

Join GitHub today

Question on kernel.py of SHAP package #165

Question on kernel.py of SHAP package #165

Comments

roye10 commented Sep 23, 2020 • edited

interpret-ml commented Oct 2, 2020

roye10 commented Oct 2, 2020 • edited

roye10 commented Oct 7, 2020

rodrigovssp commented Oct 9, 2020

Essential cookies

Always active

Analytics cookies

roye10 commented Sep 23, 2020 •

edited

roye10 commented Oct 2, 2020 •

edited