I'm reading through this very good text classifier example from sklearn that takes a collection of documents (newspaper articles), vectorises the corpus, then runs the resulting n x p matrix through a series of classifiers and grabs some comparison metrics including the following code:
clf = SomeClassifier(parameters)
if hasattr(clf, 'coef_'):
print(f"dimensionality: {clf.coef_.shape[1]}")
print(f"density: {density(clf.coef_)}")
I know that the n x p sparse matrix has n rows (number of newspaper articles) and p predictor columns (number of tokens passed by the vectoriser, in my case individual words). I know that the attribute coef_ only exists for models that form some kind of line or boundary, and that it has shape c x p where c is the number of classes from the target variable and uses the same p number of parameters passed to it by the vectoriser, which is clf.coef_.shape[1] in the example code. Unfortunately my matrix maths is very rusty, and I'm still learning NLP text analysis so I'm stuck with this last bit
Questions:
i) What does the function density() tell us about a matrix?
The example models ~0.001-1.0 for the various model, so I assume this is something about the proportion of null/0 values in the array? I can't seem to find what library this function is even coming from, and my web searches keep returning density graphs or physics articles, which I don't think are relevant.
ii) What does the density of a class c x p token predictor matrix tell us about the text classifiers?
If density is the proportion of null values, I assume this tells us if the model is selecting relevant parameters on a class by class basis, like in LASSO / L1 normalisation. If this is the case, I assume that a very low value (density = ~0.001) means that most of the parameters are dropped, and a value of 1 means that the model is using every parameter to predict every class. Am I off the beaten track here? If I'm right, can anyone link to any articles or write examples to help me understand this?