how to replace zero-dimensional arrays in pandas dataframe?

Question

I have a pandas dataframe that looks like this:

index	array	array2
1	`['group1']`	`['group1, 'group3']`
2	`['group1', 'group4]`	`['group1', 'group3']`
3	`[]`	`['group2', 'group3']`
4	`[]`	`['group2', 'group4']`

as you can see, some of these arrays are zero-dimensional (to be specific: they are

array([], dtype=object

)

Now, because zero-dimensional arrays can not be concatenated, I want to replace them as np.nans so that I can concatenate them.

but if I do

data['array'].replace(np.array([]).astype(object), 0, inplace= True)

Nothing happens! The dataframe stays the same and nothing changes. In fact, even if I do it manually:

data['array'].replace(data['array'][3], 0, inplace = True)

my resulting dataframe is not altered at all...

My question then is, how can we create a function to replace all of these zero-dimensional arrays in a data frame for concatenation?

Experimenting with replace, it looks like it can match strings and numbers, but doesn't seem to work with list or array. Your array has shape (0,), so it is actually 1d. Doing an equality test on such array is tricky (or on any array). Also beware that the list [], and string "[] display the same as your array. — hpaulj
– hpaulj, Commented Apr 6, 2021 at 4:06

Quang Hoang · Accepted Answer · 2021-04-06 02:38:07Z

1

You can try:

data['array'] = [x if len(x) else 0 for x in data['array']]

answered Apr 6, 2021 at 2:38

Quang Hoang

151k11 gold badges63 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pablo C · Accepted Answer · 2021-04-06 03:02:23Z

0

You can access its length for masking:

>>> df.loc[df.array.str.len().eq(0), "array"] = np.nan
>>> df

   index             array            array2
0      1          [group1]  [group1, group3]
1      2  [group1, group4]  [group1, group3]
2      3               NaN  [group2, group3]
3      4               NaN  [group2, group4]

answered Apr 6, 2021 at 3:02

Pablo C

4,7612 gold badges10 silver badges26 bronze badges

Comments

Pygirl · Accepted Answer · 2021-04-06 03:03:14Z

0

You can also use mask:

data = data.mask(data.applymap(str).eq('[]'))

Or you can use this:

data['array'] = data['array'].where(data['array'].str.len() > 2, np.nan)

   index             array            array2
0      1          [group1]  [group1, group3]
1      2  [group1, group4]  [group1, group3]
2      3               NaN  [group2, group3]
3      4               NaN  [group2, group4]

answered Apr 6, 2021 at 3:03

Pygirl

13.4k6 gold badges36 silver badges48 bronze badges

Collectives™ on Stack Overflow

how to replace zero-dimensional arrays in pandas dataframe?

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related