The Wayback Machine - https://web.archive.org/web/20201109144254/https://github.com/interpretml/interpret/issues/165
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on kernel.py of SHAP package #165

Closed
roye10 opened this issue Sep 23, 2020 · 4 comments
Closed

Question on kernel.py of SHAP package #165

roye10 opened this issue Sep 23, 2020 · 4 comments

Comments

@roye10
Copy link

@roye10 roye10 commented Sep 23, 2020

First of all, the package is great! I only have a few technical questions with regards to the kernel.py code of the SHAP package

I am trying to understand how SHAP values are computed by the Kernel SHAP method from the SHAP package. As I understand it, you solve a weighted linear regression to attain SHAP values, using phi = ((X'WX)^-1)X'Wy.

At a high level, wha I gather so far - especially looking at lines 296-343, in combination with the referenced functions:

  • nsamples determines how many permutations out of the M! possible feature orderings are regarded
  • (nsamples X number of background samples (size N))-number of y-values are computed, whereby each N-sized set is the result of a different feature ordering (i.e. different features "switched on/off") for each row of background samples (size N) and the empirical mean is taken for each of the N-sized sets, producing nsamples empirical means
  • a diff-in-diff is calculated, to match feature contributions to individual features ( E[y|X_s] - f_null ) - maskMatrix*( f_x - f_null)
  • then a WLS is solved, providing the SHAP values

Did I understand this correctly?

If so, there are a few things, however, I would like to ask:

  • if the feature space is fairly large (say 100+), since nsamples restricts variety of feature permutations computed and since the feature permutations are regarded in a sequential manner, (i.e. first only subsets of size 1, 2, ...), does this not disregard interaction effects between features? Even though this is partially accounted for as I understand by

"add random samples from what is left of the subset space"

Because even for 47 features onwards, if I understand it correctly, only subsets of size 1 (and its complement) are considered fully.

  • why is one variable eliminated later on in the solve function?

"eliminate one variable with the constraint that all features sum to the output"

  • what is the intuition behind multiplying the weight vector by 2?

Thank you very much in advance!

@roye10 roye10 changed the title Question on loss function used in SHAP Question on addsample function in kernel.py of SHAP package Sep 25, 2020
@roye10 roye10 changed the title Question on addsample function in kernel.py of SHAP package Question on kernel.py of SHAP package Sep 28, 2020
@interpret-ml
Copy link
Collaborator

@interpret-ml interpret-ml commented Oct 2, 2020

Hi @roye10,

Sorry for the delay in getting back to you! @slundberg would be the best person to answer this question. We see that you opened an issue on the SHAP repo as well (slundberg/shap#1488), which might be a better place to track a response. Happy to leave this issue open in the meanwhile.

-InterpretML Team

@roye10
Copy link
Author

@roye10 roye10 commented Oct 2, 2020

Hi InterpretML Team,

no worries, thank you for your reply. That is appreciated, thanks! I'll post an update if I get a response or found a response in anyway.

-roye10

@roye10
Copy link
Author

@roye10 roye10 commented Oct 7, 2020

Think I found an answer to:

what is the intuition behind multiplying the weight vector by 2?

weight vector multiplied by 2, in order to regain weight for one subsample (before weight vector multiplied by two, to account for the complement subset and such that weight vector (including complement subset) sum to 1)

@rodrigovssp
Copy link

@rodrigovssp rodrigovssp commented Oct 9, 2020

Em parâmetros de desenvolvimento

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.