1

I have two DataFrames:

  • df - the core DataFrame with columns/cells that I want to expand
  • maptable - a maptable DataFrame that maps certain columns

An example:

maptable:

id | period
A  | winter
B  | summer
A  | summer
nan | summer
B  | nan

df:

id | period  | other_col
A  | None    | X
B  | summer  | Y
C  | None    | Z
D  | spring  | D
D  | NaN

How can I only map the cells in df that are None/empty/nan using the maptable and the identifier column id?

2
  • What would be the expected output you are looking for? Commented Feb 7, 2020 at 15:33
  • 2
    df['period'].fillna(df['id'].map(maptable.set_index('id')['period']))? Commented Feb 7, 2020 at 15:37

2 Answers 2

3

Use Series.map and then fill NaN with Series.fillna:

df['period']= df['period'].fillna(df['id'].map(maptable.set_index('id')['period']))   
#alternative
#df['period']= (df['id'].map(maptable.set_index('id')['period'])
#                       .where(df['period'].isnull(),df['period']))

Output

  id other_col  period
0  A         X  winter
1  B         Y  summer
2  C         Z     NaN
3  D         D  spring

EDIT DataFrame.merge

new_df= (df.merge(maptable,on = 'id',how = 'left')
           .assign(period = lambda x: x['period_x'].fillna(x['period_y']))
           .loc[:,df.columns])
print(new_df)
  id  period other_col
0  A  winter         X
1  A  summer         X
2  B  summer         Y
3  C     NaN         Z
4  D  spring         D
Sign up to request clarification or add additional context in comments.

7 Comments

What is the difference with the alternative?
then we need merge
Does this also wrok with x['period_x'] as my col name might be dynamic
it is the same, you can select the suffix
I am receiving the error ['period'] not in index in my dataset
|
1
# Creating your dataframes
maptable = pd.DataFrame([{"id":"A","period":"winter"},{"id":"B","period":"summer"}])
df = pd.DataFrame({"id":["A","B","C","D"], "period":[None, "summer", None, "spring"], "other_col":list('XYZD')})

# Merging both dataframes on the "id" key
df1 = pd.merge(left=df, right=maptable, on="id", how="left")
df1["period"] = [x if not pd.isnull(x) else y for x, y in zip(df1["period_x"], df1["period_y"])]
df1.drop(["period_x", "period_y"], axis=1, inplace=True)
print(df1)

Output:

  id other_col  period
0  A         X  winter
1  B         Y  summer
2  C         Z     NaN
3  D         D  spring

1 Comment

How does this work with duplicate indices in the maptable?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.