Fix array with rows of different lengths by filling the empty elements with zeros

Question

The functionality I am looking for looks something like this:

data = np.array([[1, 2, 3, 4],
                 [2, 3, 1],
                 [5, 5, 5, 5],
                 [1, 1]])

result = fix(data)
print result

[[ 1.  2.  3.  4.]
 [ 2.  3.  1.  0.]
 [ 5.  5.  5.  5.]
 [ 1.  1.  0.  0.]]

These data arrays I'm working with are really large so I would really appreciate the most efficient solution.

Edit: Data is read in from disk as a python list of lists.

simply add the data type to the array function call, np.array(...,dtype=np.float64)np.array(...,dtype=np.float64), or use loadtxt, savetxt from numpy. — nickpapior
– nickpapior, Commented Aug 16, 2015 at 17:33
@zeroth I have tried that and got ValueError: setting an array element with a sequence. Could you explain more? — user2909415
– user2909415, Commented Aug 16, 2015 at 17:36
Is it likely to be a Sparse matrix with most entries as zero? Can it fit in memory as a dense matrix? — musically_ut
– musically_ut, Commented Aug 16, 2015 at 17:54
@musically_ut No it isn't sparse. Often there are only 1-3 elements missing at the ends. — user2909415
– user2909415, Commented Aug 16, 2015 at 18:10

Divakar · Accepted Answer · 2016-11-14 13:07:03Z

28

This could be one approach -

def numpy_fillna(data):
    # Get lengths of each row of data
    lens = np.array([len(i) for i in data])

    # Mask of valid places in each row
    mask = np.arange(lens.max()) < lens[:,None]

    # Setup output array and put elements from data into masked positions
    out = np.zeros(mask.shape, dtype=data.dtype)
    out[mask] = np.concatenate(data)
    return out

Sample input, output -

In [222]: # Input object dtype array
     ...: data = np.array([[1, 2, 3, 4],
     ...:                  [2, 3, 1],
     ...:                  [5, 5, 5, 5, 8 ,9 ,5],
     ...:                  [1, 1]])

In [223]: numpy_fillna(data)
Out[223]: 
array([[1, 2, 3, 4, 0, 0, 0],
       [2, 3, 1, 0, 0, 0, 0],
       [5, 5, 5, 5, 8, 9, 5],
       [1, 1, 0, 0, 0, 0, 0]], dtype=object)

edited Nov 14, 2016 at 13:07

answered Aug 17, 2015 at 5:31

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jerry Londergaard Over a year ago

The accepted answer is almost correct. I assume it was an oversight, but the following: # Mask of valid places in each row mask = np.arange(lens.size) < lens[:,None] Should Actually be: # Mask of valid places in each row mask = np.arange(max(lens)) < lens[:,None] The accepted answer happens to work for the tested input because lens.size == max(lens). If it's not, it no longer works...

Neil Slater Over a year ago

I think lens.size should be lens.max() - in your answer these are equal to make a square matrix. But try with a ragged row longer than the number of rows and you will get an error.

WestCoastProjects Over a year ago

that mask is brilliant

Eastsun · Accepted Answer · 2015-08-17 05:39:53Z

15

You could use pandas instead of numpy:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([[1, 2, 3, 4],
   ...:                    [2, 3, 1],
   ...:                    [5, 5, 5, 5],
   ...:                    [1, 1]], dtype=float)


In [3]: df.fillna(0.0).values
Out[3]: 
array([[ 1.,  2.,  3.,  4.],
       [ 2.,  3.,  1.,  0.],
       [ 5.,  5.,  5.,  5.],
       [ 1.,  1.,  0.,  0.]])

answered Aug 17, 2015 at 5:39

Eastsun

18.9k7 gold badges60 silver badges87 bronze badges

1 Comment

mat_dw Over a year ago

Doesn't seem to work for deeper nesting levels, though :(

陈家胜 · Accepted Answer · 2017-07-11 13:45:52Z

use np.pad().

In [62]: arr
Out[62]: 
[array([0]),
 array([83, 74]),
 array([87, 61, 23]),
 array([71,  3, 81, 77]),
 array([20, 44, 20, 53, 60]),
 array([54, 36, 74, 35, 49, 54]),
 array([11, 36,  0, 98, 29, 87, 21]),
 array([ 1, 22, 62, 51, 45, 40, 36, 86]),
 array([ 7, 22, 83, 58, 43, 59, 45, 81, 92]),
 array([68, 78, 70, 67, 77, 64, 58, 88, 13, 56])]

In [63]: max_len = np.max([len(a) for a in arr])

In [64]: np.asarray([np.pad(a, (0, max_len - len(a)), 'constant', constant_values=0) for a in arr])
Out[64]: 
array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [83, 74,  0,  0,  0,  0,  0,  0,  0,  0],
       [87, 61, 23,  0,  0,  0,  0,  0,  0,  0],
       [71,  3, 81, 77,  0,  0,  0,  0,  0,  0],
       [20, 44, 20, 53, 60,  0,  0,  0,  0,  0],
       [54, 36, 74, 35, 49, 54,  0,  0,  0,  0],
       [11, 36,  0, 98, 29, 87, 21,  0,  0,  0],
       [ 1, 22, 62, 51, 45, 40, 36, 86,  0,  0],
       [ 7, 22, 83, 58, 43, 59, 45, 81, 92,  0],
       [68, 78, 70, 67, 77, 64, 58, 88, 13, 56]])

yourstruly · Accepted Answer · 2015-08-16 19:14:13Z

4

This would be nice if in some vectorized way, but Im still a NOOB, so its all I could think now!

import numpy as np,numba as nb
a=np.array([[1, 2, 3, 4],
                 [2, 3, 1],
                 [5, 5, 5, 5,5],
                 [1, 1]])
@nb.jit()
def f(a):
    l=len(max(a,key=len))
    a0=np.empty(a.shape+(l,))
    for n,i in enumerate(a.flat):
        a0[n]=np.pad(i,(0,l-len(i)),mode='constant')
    a=a0
    return a

print(f(a))

answered Aug 16, 2015 at 19:14

yourstruly

1,0021 gold badge10 silver badges17 bronze badges

Comments

General Grievance · Accepted Answer · 2020-09-27 02:13:24Z

0

data = np.array([[1, 2, 3, 4],
                 [2, 3, 1],
                 [5, 5, 5, 5],
                 [1, 1]])
max_len=max([len(i) for i in data])
np.array([ np.pad(data[i],
           (0,max_len-len(data[i])),
           'constant',
            constant_values=0) for i in range(len(data))])

The lengths of the individual arrays are computed, then the maximum among these lengths is stored in a variable. After which all the individual rows of the matrix is padded with 0s on the right to match the maximum length.

edited Sep 27, 2020 at 2:13

General Grievance

5,12039 gold badges39 silver badges58 bronze badges

answered Sep 27, 2020 at 1:51

Prasaanth Selvakumar

214 bronze badges

Collectives™ on Stack Overflow

Fix array with rows of different lengths by filling the empty elements with zeros

5 Answers 5

3 Comments

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

1 Comment

Comments

Comments

Comments

Linked

Related