How can I combine one column and a matrix into a larger matrix with `numpy`?

Question

I'm trying to normalize a matrix by doing (X - means) / variance to each row.

Since I am implementing this with MapReduce, I first calculate the means and standard variance for each column, and then map each row with:

   matrix.map(lambda X: (X - means) / variance)

But I want to ignore the first element in each row X, which is my target column containing only 1s and 0s.

How can I do this?

Jaime · Accepted Answer · 2012-12-24 15:27:49Z

If A is a numpy array of shape (m, n + 1) and you also have arrays mu and s2 of shape (n,) holding the mean and variance of each column except the first one, you can do your normalization as follows:

A[:, 1:] = (A[:, 1:] - mu) / s2

To undestand wat goes on, you need to understand how broadcasting works. Since A[:, 1:] has shape (m, n) and mu and s2 shape (n,), these last two have 1s prepended to their shape to match the dimensions of the first, so they are treated as (1, n) arrays, and during the arithmetic operations the value in their first and only row is broadcasted to all rows.

If you are not already doing so, your meand and variance arrays can be calculated efficiently as

mu = (A[:, 1:].mean(axis=0)
s2 = A[:, 1:].var(axis=0)

For the variance you may want to use np.std squared to take advantage of the ddof argument, see the docs.

On a separate note, normalization is normally done dividing by the standard deviation, not the variance.

Thanks. I knew the mean and var methods, but I think they are just for small datasets. For large datasets, I have to implement them with MapReduce. In this case, I need to map a row so that the returned array are normalized (ignoring the first column).
np.concatenate((X[0], (X[1] - mean) / std_var) is what I want ;)

Collectives™ on Stack Overflow

How can I combine one column and a matrix into a larger matrix with `numpy`?

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related