0

I have a csv file where each row represents a property followed by a variable number of subsequent rows that reflect rooms in the property. I want to create a column that, for each property, summates the gross floor area of each room. The unstructured nature of the data is making this difficult to achieve in pandas. Here is an example of the table I have at the moment:

id  ba  store_desc      floor_area
0   1   Toy Shop        NaN
1   2   Retail Zone A   29.42
2   2   Retail Zone B   31.29
3   1   Grocery Store   NaN
4   2   Retail Zone A   68.00
5   2   Outside Garden  83.50
6   2   Office          7.30

Here is the table I am trying to create:

id  ba  store_desc      floor_area   gross_floor_area
0   1   Toy Shop        NaN          60.71
3   1   Grocery Store   NaN          158.8

Does anybody have any pointers on how to achieve this result? I'm totally lost.

Sam

2 Answers 2

3

IIUC

df1=df[df['floor_area'].isnull()]

df1['gross_floor_area']=df.groupby(df['floor_area'].isnull().cumsum())['floor_area'].sum().values

df1
Out[463]: 
   id  ba    store_desc  floor_area  gross_floor_area
0   0   1       ToyShop         NaN             60.71
3   3   1  GroceryStore         NaN            158.80
Sign up to request clarification or add additional context in comments.

Comments

1

First made a temporary column named category which I then forward filled, grouped by that column to get the sum, and then mapped that back to the relevant store_desc values.

df['category'] = df[df.floor_area.isnull()]['store_desc']

df['category'].fillna(method='ffill',inplace=True)

df['gross_floor_area'] = df.store_desc.map(df.groupby('category').sum().floor_area)

df.drop('category',axis=1,inplace=True)

df[df.gross_floor_area.notnull()]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.