0

I have row indices as a 1d numpy array and a list of numpy arrays (list as same length as the size of the row indices array. I want to extract values corresponding to these indices. How can I do it ?

This is an example of what I want as output given the input

A = np.array([[2, 1, 1, 0, 0],
              [3, 0, 2, 1, 1],
              [0, 0, 2, 1, 0],
              [0, 3, 3, 3, 0],
              [0, 1, 2, 1, 0],
              [0, 1, 3, 1, 0],
              [2, 1, 3, 0, 1],
              [2, 0, 2, 0, 2],
              [3, 0, 3, 1, 2]])

row_ind = np.array([0,2,4])
col_ind = [np.array([0, 1, 2]), np.array([2, 3]), np.array([1, 2, 3])]

Now, I want my output as a list of numpy arrays or list of lists as

[np.array([2, 1, 1]), np.array([2, 1]), np.array([1, 2, 1])]

My biggest concern is the efficiency. My array A is of dimension 20K x 10K.

6
  • you normally would use np.ix_ for that but to clarify: do you mean you want to avoid advanced indexing for performance reasons? Commented Feb 18, 2020 at 11:14
  • @MrFuppes I am fine with advanced indexing. But I do not see how can I apply advanced indexing here. Commented Feb 18, 2020 at 11:17
  • what I was relating to was that np.ix_ uses advanced indexing. see my answer below. make sure to watch memory consumption if you process large arrays... Commented Feb 18, 2020 at 11:24
  • Since col_ind vary in length, I think you'll require a loop. Commented Feb 18, 2020 at 11:36
  • In your example, shouldn't the third array of the output be np.array([1, 2, 1]) ? Commented Feb 18, 2020 at 22:19

1 Answer 1

2

As @hpaulj commented, likely, you won't be able to avoid looping - e.g.

import numpy as np

A = np.array([[2, 1, 1, 0, 0],
              [3, 0, 2, 1, 1],
              [0, 0, 2, 1, 0],
              [0, 3, 3, 3, 0],
              [0, 1, 2, 1, 0],
              [0, 1, 3, 1, 0],
              [2, 1, 3, 0, 1],
              [2, 0, 2, 0, 2],
              [3, 0, 3, 1, 2]])


row_ind = np.array([0,2,4])
col_ind = [np.array([0, 1, 2]), np.array([2, 3]), np.array([1, 2, 3])]

# make sure the following code is safe...
assert row_ind.shape[0] == len(col_ind)

# 1) select row (A[r, :]), then select elements (cols) [col_ind[i]]:
output = [A[r, :][col_ind[i]] for i, r in enumerate(row_ind)]

# output
# [array([2, 1, 1]), array([2, 1]), array([1, 2, 1])] 

Another way to do this could be to use np.ix_ (still requires looping). Use with caution though for very large arrays; np.ix_ uses advanced indexing - in contrast to basic slicing, it creates a copy of the data instead of a view - see the docs.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.