Difference between ALL 1D points in array with python diff()?

Question

Looking for tips on how one would write a function (or could recommend a function that already exists) that calculates the difference between all entries in the array i.e. an implementation of diff() but for all entry combinations in the array not just consecutive pairs.

Here is an example of what I want:

# example array
a = [3, 2, 5, 1]

Now we want to apply a function which will return the difference between all combinations of entries. Now given that length(a) == 4 that means that the total number of combinations is, for N = 4; N*(N-1)*0.5 = 6 (if the length of a was 5 then the total number of combinations would be 10 and so on). So the function should return the following for vector a:

result = some_function(a)
print result
array([-1, 2, -2, 3, -1, -4])

So the 'function' would be similar to pdist but instead of calculating the Euclidean distance, it should simply calculate the difference between the Cartesian coordinate along one axis e.g. the z-axis if we assume that the entries in a are coordinates. As can be noted I need the sign of each difference to understand what side of the axis each point is located.

Thanks.

wim · Accepted Answer · 2014-01-08 18:10:52Z

4

Something like this?

>>> import itertools as it
>>> a = [3, 2, 5, 1]
>>> [y - x for x, y in it.combinations(a, 2)]
[-1, 2, -2, 3, -1, -4]

answered Jan 8, 2014 at 18:10

wim

368k112 gold badges680 silver badges815 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Astrid Over a year ago

Cheers for that. Yeah something like that. But I was wondering if there is some function of numpy or scipy which does it efficiently? I need to calculate the above for several tens of thousands of rows of data, so ideally it should be efficient.

Joe Kington Over a year ago

@Astrid - The vectorized numpy version will calculate each value twice, but it some cases it might still be faster. Basically, you want the lower triangle of np.subtract.outer(a, a). wim's answer is likely to be faster in a lot of cases, though.

wim Over a year ago

Didn't notice the numpy tag at first, and you used list data in your question. Joe's suggestion is good, and then you can collect the values from the lower triangle using np.tril_indices

Astrid · Accepted Answer · 2014-01-09 00:50:59Z

So I tried out the methods proposed by wim and Joe (and Joe and wim's combined suggestion), and this is what I came up with:

import itertools as it
import numpy as np

a = np.random.randint(10, size=1000)

def cartesian_distance(x):
    return np.subtract.outer(x,x)[np.tril_indices(x.shape[0],k=-1)]

%timeit cartesian_distance(a)
%timeit [y - x for x, y in it.combinations(a, 2)]

10 loops, best of 3: 97.9 ms per loop
1 loops, best of 3: 333 ms per loop

For smaller entries:

a = np.random.randint(10, size=10)

def cartesian_distance(x):
    return np.subtract.outer(x,x)[np.tril_indices(x.shape[0],k=-1)]

%timeit cartesian_distance(a)
%timeit [y - x for x, y in it.combinations(a, 2)]

10000 loops, best of 3: 78.6 µs per loop
10000 loops, best of 3: 40.1 µs per loop

Collectives™ on Stack Overflow

Difference between ALL 1D points in array with python diff()?

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related