5

Looking for tips on how one would write a function (or could recommend a function that already exists) that calculates the difference between all entries in the array i.e. an implementation of diff() but for all entry combinations in the array not just consecutive pairs.

Here is an example of what I want:

# example array
a = [3, 2, 5, 1]

Now we want to apply a function which will return the difference between all combinations of entries. Now given that length(a) == 4 that means that the total number of combinations is, for N = 4; N*(N-1)*0.5 = 6 (if the length of a was 5 then the total number of combinations would be 10 and so on). So the function should return the following for vector a:

result = some_function(a)
print result
array([-1, 2, -2, 3, -1, -4])

So the 'function' would be similar to pdist but instead of calculating the Euclidean distance, it should simply calculate the difference between the Cartesian coordinate along one axis e.g. the z-axis if we assume that the entries in a are coordinates. As can be noted I need the sign of each difference to understand what side of the axis each point is located.

Thanks.

2 Answers 2

4

Something like this?

>>> import itertools as it
>>> a = [3, 2, 5, 1]
>>> [y - x for x, y in it.combinations(a, 2)]
[-1, 2, -2, 3, -1, -4]
Sign up to request clarification or add additional context in comments.

3 Comments

Cheers for that. Yeah something like that. But I was wondering if there is some function of numpy or scipy which does it efficiently? I need to calculate the above for several tens of thousands of rows of data, so ideally it should be efficient.
@Astrid - The vectorized numpy version will calculate each value twice, but it some cases it might still be faster. Basically, you want the lower triangle of np.subtract.outer(a, a). wim's answer is likely to be faster in a lot of cases, though.
Didn't notice the numpy tag at first, and you used list data in your question. Joe's suggestion is good, and then you can collect the values from the lower triangle using np.tril_indices
1

So I tried out the methods proposed by wim and Joe (and Joe and wim's combined suggestion), and this is what I came up with:

import itertools as it
import numpy as np

a = np.random.randint(10, size=1000)

def cartesian_distance(x):
    return np.subtract.outer(x,x)[np.tril_indices(x.shape[0],k=-1)]

%timeit cartesian_distance(a)
%timeit [y - x for x, y in it.combinations(a, 2)]

10 loops, best of 3: 97.9 ms per loop
1 loops, best of 3: 333 ms per loop

For smaller entries:

a = np.random.randint(10, size=10)

def cartesian_distance(x):
    return np.subtract.outer(x,x)[np.tril_indices(x.shape[0],k=-1)]

%timeit cartesian_distance(a)
%timeit [y - x for x, y in it.combinations(a, 2)]

10000 loops, best of 3: 78.6 µs per loop
10000 loops, best of 3: 40.1 µs per loop

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.