0

I have a data structure like this:

  • my source arrays are a sorted arrays like [2,3,4,5,7,8,9,10,11]
  • I know a priori the max number of this array collection, in this case it’s 17

What I need to do is to build a sparse matrix with 17 rows (the max number mentioned above) and n cols where n is the number of arrays, and each column vector should contain a mapping of the index+1 of the source vector in position [value of the source vector’s element], and 0 when it’s not present. In the mentioned example the output vector should be [0,1,2,3,4,0,5,6,7,8,9,10,11,0,0,0,0]. Is there an efficient way to do that in numpy without having to loop through cols and rows which would have a dramatic computational cost?

3
  • NOt a machine-learning or scikit-learn question, kindly do not spam irrelevant tags (removed & replaced with scipy). That said, have a look here: Creating a sparse matrix from numpy array Commented Sep 24, 2020 at 16:42
  • 1
    Your example isn't clear. You tag scipy. Is that because you want to use scipy.sparse? Or are you just using sparse in the loose sense of an array with some 0s? Commented Sep 24, 2020 at 18:47
  • @hpaulj scipy tag was added by me in en edit, not by OP, as clearly stated in my 1st comment above Commented Sep 24, 2020 at 22:26

3 Answers 3

1
from scipy import sparse
import numpy as np

in_list = [2,3,4,5,7,8,9,10,11]
in_list_len = len(in_list)
max_num = 17
a = sparse.csr_matrix((max_num, in_list_len), dtype=np.int)

for ind, val in enumerate(in_list):
    a[val, ind] = ind + 1

and

Out[23]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 3, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 4, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 5, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 6, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 7, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 8, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 9],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]])

Complexity is O(len(in_list))

Your desired output makes no sense, because you asked for a matrix but specified a list.
I am pretty sure this is what you wanted.

The closest would be

a.data
Out[18]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Sign up to request clarification or add additional context in comments.

Comments

0

This functionality appears to exist in pandas:

matteos_sorted_arrays_with_nans = mateos_sorted_arrays
matteos_sorted_arrays[2:-2] = np.nan
sdf = pd.Series(pd.SparseArray(matteos_sorted_arrays_with_nans))

Without further particulars, I haven't the foggiest what to recommend as a next step, though.

Comments

0

Using x to assign consecutive numbers to elements of an array:

In [16]: x
Out[16]: array([ 2,  3,  4,  5,  7,  8,  9, 10, 11])
In [17]: arr = np.zeros(17,int)
In [18]: arr
Out[18]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [19]: arr[x-1] = np.arange(1,len(x)+1)
In [20]: arr
Out[20]: array([0, 1, 2, 3, 4, 0, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0])

Comments