41

I have the following code which is attempting to normalize the values of an m x n array (It will be used as input to a neural network, where m is the number of training examples and n is the number of features).

However, when I inspect the array in the interpreter after the script runs, I see that the values are not normalized; that is, they still have the original values. I guess this is because the assignment to the array variable inside the function is only seen within the function.

How can I do this normalization in place? Or do I have to return a new array from the normalize function?

import numpy

def normalize(array, imin = -1, imax = 1):
    """I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

    dmin = array.min()
    dmax = array.max()

    array = imin + (imax - imin)*(array - dmin)/(dmax - dmin)
    print array[0]


def main():

    array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)
    for column in array.T:
        normalize(column)

    return array

if __name__ == "__main__":
    a = main()
0

4 Answers 4

33

If you want to apply mathematical operations to a numpy array in-place, you can simply use the standard in-place operators +=, -=, /=, etc. So for example:

>>> def foo(a):
...     a += 10
... 
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> foo(a)
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

The in-place version of these operations is a tad faster to boot, especially for larger arrays:

>>> def normalize_inplace(array, imin=-1, imax=1):
...         dmin = array.min()
...         dmax = array.max()
...         array -= dmin
...         array *= imax - imin
...         array /= dmax - dmin
...         array += imin
...     
>>> def normalize_copy(array, imin=-1, imax=1):
...         dmin = array.min()
...         dmax = array.max()
...         return imin + (imax - imin) * (array - dmin) / (dmax - dmin)
... 
>>> a = numpy.arange(10000, dtype='f')
>>> %timeit normalize_inplace(a)
10000 loops, best of 3: 144 us per loop
>>> %timeit normalize_copy(a)
10000 loops, best of 3: 146 us per loop
>>> a = numpy.arange(1000000, dtype='f')
>>> %timeit normalize_inplace(a)
100 loops, best of 3: 12.8 ms per loop
>>> %timeit normalize_copy(a)
100 loops, best of 3: 16.4 ms per loop
Sign up to request clarification or add additional context in comments.

4 Comments

what is %timeit? That looks interesting, is it built-in?
The version I use here is only built in to ipython. But it's based on the timeit function in the timeit module.
Ah finally looked at ipython. Funny I had always associated it with ironpython, mistakenly I now see.
@User, yeah it's quite useful at times. I usually just use the regular python shell, but for timings, the %timeit "magic command" in incredibly handy, because it takes care of all the awkward setup for you.
13

This is a trick that it is slightly more general than the other useful answers here:

def normalize(array, imin = -1, imax = 1):
    """I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

    dmin = array.min()
    dmax = array.max()

    array[...] = imin + (imax - imin)*(array - dmin)/(dmax - dmin)

Here we are assigning values to the view array[...] rather than assigning these values to some new local variable within the scope of the function.

x = np.arange(5, dtype='float')
print x
normalize(x)
print x

>>> [0. 1. 2. 3. 4.]
>>> [-1.  -0.5  0.   0.5  1. ]

EDIT:

It's slower; it allocates a new array. But it may be valuable if you are doing something more complicated where builtin in-place operations are cumbersome or don't suffice.

def normalize2(array, imin=-1, imax=1):
    dmin = array.min()
    dmax = array.max()

    array -= dmin;
    array *= (imax - imin)
    array /= (dmax-dmin)
    array += imin

A = np.random.randn(200**3).reshape([200] * 3)
%timeit -n5 -r5 normalize(A)
%timeit -n5 -r5 normalize2(A)

>> 47.6 ms ± 678 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)
>> 26.1 ms ± 866 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)

1 Comment

and what would be the timings of it?
4
def normalize(array, imin = -1, imax = 1):
    """I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

    dmin = array.min()
    dmax = array.max()


    array -= dmin;
    array *= (imax - imin)
    array /= (dmax-dmin)
    array += imin

    print array[0]

2 Comments

Performance-wise is there any issue doing it this way? How does it compare to creating a new array?
I mean, for that you'd have to benchmark. It depends on the size of the array. For small-ish problems, I would certainly just create the new array.
1

There is a nice way to do in-place normalization when using numpy. np.vectorize is is very usefull when combined with a lambda function when applied to an array. See the example below:

import numpy as np

def normalizeMe(value,vmin,vmax):

    vnorm = float(value-vmin)/float(vmax-vmin)

    return vnorm

imin = 0
imax = 10
feature = np.random.randint(10, size=10)

# Vectorize your function (only need to do it once)
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax)) 
normfeature = temp(np.asarray(feature))

print feature
print normfeature

One can compare the performance with a generator expression, however there are likely many other ways to do this.

%%timeit
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax)) 
normfeature1 = temp(np.asarray(feature))
10000 loops, best of 3: 25.1 µs per loop


%%timeit
normfeature2 = [i for i in (normalizeMe(val,imin,imax) for val in feature)]
100000 loops, best of 3: 9.69 µs per loop

%%timeit
normalize(np.asarray(feature))
100000 loops, best of 3: 12.7 µs per loop

So vectorize is definitely not the fastest, but can be conveient in cases where performance is not as important.

2 Comments

It does the job, but it is very slow since it is implemented like a for-loop, according to the documentation.
Are there any benchmarks for this kind of thing? You'd hope that vectorize might help it go much faster.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.