1

I'd like to take the difference of non-adjacent values within 2D numpy array along axis=-1 (per row). An array can consist of a large number of rows.

Each row is a selection of values along a timeline from 1 to N.

For N=12, the array could look like below 3x12 shape:

timeline = np.array([[ 0,  0,  0,  4,  0,  6,  0,  0,  9,  0, 11,  0],
                     [ 1,  0,  3,  4,  0,  0,  0,  0,  9,  0,  0, 12],
                     [ 0,  0,  0,  4,  0,  0,  0,  0,  9,  0,  0,  0]])                                                  

The desired result should look like: (size of array is intact and position is important)

 diff =    np.array([[ 0,  0,  0,  4,  0,  2,  0,  0,  3,  0,  2,  0],
                     [ 1,  0,  2,  1,  0,  0,  0,  0,  5,  0,  0,  3],
                     [ 0,  0,  0,  4,  0,  0,  0,  0,  5,  0,  0,  0]])

I am aware of the solution in 1D, Diff on non-adjacent values

imask = np.flatnonzero(timeline)
diff = np.zeros_like(timeline)
diff[imask] = np.diff(timeline[imask], prepend=0)

within which the last line can be replaced with

diff[imask[0]] = timeline[imask[0]]
diff[imask[1:]] = timeline[imask[1:]] - timeline[imask[:-1]]

and the first line can be replaced with

imask = np.where(timeline != 0)[0]

Attempting to generalise the 1D solution I can see imask = np.flatnonzero(timeline) is undesirable as rows becomes inter-dependent. Thus I am trying by using the alternative np.nonzero.

imask = np.nonzero(timeline)
diff = np.zeros_like(timeline)
diff[imask] = np.diff(timeline[imask], prepend=0)

However, this solution results in a connection between row's end values (inter-dependent).

array([[  0,   0,   0,   4,   0,   2,   0,   0,   3,   0,   2,   0],
       [-10,   0,   2,   1,   0,   0,   0,   0,   5,   0,   0,   3],
       [  0,   0,   0,  -8,   0,   0,   0,   0,   5,   0,   0,   0]])

How can I make the "prepend" to start each row with a zero?

1 Answer 1

2

Wow. I did it... (It is interesting problem for me too..)

I made non_adjacent_diff function to be applied to every row, and apply it to every row using np.apply_along_axis.

Try this code.

timeline = np.array([[ 0,  0,  0,  4,  0,  6,  0,  0,  9,  0, 11,  0],
                     [ 1,  0,  3,  4,  0,  0,  0,  0,  9,  0,  0, 12],
                     [ 0,  0,  0,  4,  0,  0,  0,  0,  9,  0,  0,  0]]) 

def non_adjacent_diff(row):
    not_zero_index = np.where(row != 0)
    diff = row[not_zero_index][1:] - row[not_zero_index][:-1]
    np.put(row, not_zero_index[0][1:], diff)
    return row

np.apply_along_axis(non_adjacent_diff, 1, timeline)
Sign up to request clarification or add additional context in comments.

4 Comments

although np.apply_along_axis sounds really fancy, np.vstack([non_adjacent_diff(a) for a in timeline]) seems quite a bit faster in this case.
@Quang-Hoang , thanks for the input, but I cannot see this produce the same result (when replacing 'np.apply_along_axis(non_adjacent_diff, 1, timeline)' with 'np.vstack([non_adjacent_diff(a) for a in timeline])' (but keeping the function "non_adjacent_diff(row)".
@Jaco remember that the function non_adjacent_diff changes your input timeline so you can only use it once.
@Quang-Hoang, my bad, it does produce the same, a very nice suggestion, thank you.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.