Imputation of row values in Pandas DataFrame basis specific different column row values

Question

I have below df

df = pd.DataFrame({
 'Market': {0: 'Zone1',
  1: 'Zone1',
  2: 'Zone1',
  3: 'Zone1',
  4: 'Zone2',
  5: 'Zone2',
  6: 'Zone2',
  7: 'Zone2'},
  'col1': {0: 'v1',
  1: 'v2',
  2: 'v3',
  3: 'v4',
  4: 'v1',
  5: 'v2',
  6: 'v3',
  7: 'v4'},
 'col2': {0: np.nan,
  1: 1,
  2: 6,
  3: 2,
  4: np.nan,
  5: 2,
  6: 1,
  7: 2,},
 'col3': {0: np.nan,
  1: 9,
  2: 5,
  3: 2,
  4: np.nan,
  5: 0,
  6: 9,
  7: 1,}})

For Market's each value(i.e Zone1 and Zone2) for nan values associated with value v1 in col1 , I want to replace with sum of values associated with v2 and v4. So that output will look like this -

        Market col1 col2 col3   
-----------------------------------
0     | Zone1   v1   3   11    
1     | Zone1   v2   1   9     
2     | Zone1   v3   6   5    
3     | Zone1   v4   2   2     
4     | Zone2   v1   4   1
5     | Zone2   v2   2   0     
6     | Zone2   v3   1   9
7     | Zone2   v4   2   1

Henry Ecker · Accepted Answer · 2021-05-06 03:32:26Z

1

Another option using groupby:

value_cols = ['col2', 'col3']

df.loc[
    df.col1.eq('v1'),
    value_cols
] = df[df.col1.eq('v2') |
       df.col1.eq('v4')].groupby(['Market'])[value_cols].apply(sum).values

df[value_cols] = df[value_cols].astype(int)
print(df)

Output:

  Market col1  col2  col3
0  Zone1   v1     3    11
1  Zone1   v2     1     9
2  Zone1   v3     6     5
3  Zone1   v4     2     2
4  Zone2   v1     4     1
5  Zone2   v2     2     0
6  Zone2   v3     1     9
7  Zone2   v4     2     1

edited May 6, 2021 at 3:32

answered May 6, 2021 at 3:24

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2021-05-06 03:23:12Z

We can just do simple for loop

for x in df.Market.unique():
      df.loc[df.Market.eq(x) & df.col1.eq('v1'), ['col2', 'col3']] = \
            df.loc[df.Market.eq(x) & df.col1.isin(['v2', 'v4']), ['col2', 'col3']].sum().values
        
        
df
Out[69]: 
  Market col1  col2  col3
0  Zone1   v1   3.0  11.0
1  Zone1   v2   1.0   9.0
2  Zone1   v3   6.0   5.0
3  Zone1   v4   2.0   2.0
4  Zone2   v1   4.0   1.0
5  Zone2   v2   2.0   0.0
6  Zone2   v3   1.0   9.0
7  Zone2   v4   2.0   1.0

simpleApp · Accepted Answer · 2021-05-06 04:10:37Z

0

Another way could be: limit the changes to one row at a time!

summary=df.query('col1 == "v2" or col1 == "v4" ').groupby('Market').sum()
for ind,row in summary.iterrows():
    #df.fillna({'col2': row[0],'col3': row[1]}, inplace=True,limit=1) in case memmory issue
    df=df.fillna({'col2': row[0],'col3': row[1]}, inplace=False,limit=1)
df.head(10)

answered May 6, 2021 at 4:10

simpleApp

3,1782 gold badges15 silver badges21 bronze badges

Collectives™ on Stack Overflow

Imputation of row values in Pandas DataFrame basis specific different column row values

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related