I am working on a dataset which is in the following dataframe.
#print(old_df)
col1 col2 col3
0 1 10 1.5
1 1 11 2.5
2 1 12 5,6
3 2 10 7.8
4 2 24 2.1
5 3 10 3.2
6 4 10 22.1
7 4 11 1.3
8 4 89 0.5
9 4 91 3.3
I am trying to generate another data frame which contains selected col1 values as index, selected col2 values as columns and assign respective col3 value.
Eg:
selected_col1 = [1,2]
selected_col2 = [10,11,24]
New data frame should be looking like:
#print(selected_df)
10 11 24
1 1.5 2.5 Nan
2 7.8 Nan 2.1
I have tried following method
selected_col1 = [1,2]
selected_col2 = [10,11,24]
selected_df =pd.DataFrame(index=selected_col1,columns=selected_col2)
for col1_value in selected_col1:
for col2_value in selected_col2:
qry = 'col1 == {} & col2 == {}'.format(col1_value,col2_value)
col3_value = old_df.query(qry).col3.values
if(len(col3_value) > 0):
selected_df.at[col1_value,col2_value] = col3_value[0]
But because my dataframe has around 20 million rows, this brute force kind of method is taking long time. Is there a way better than this?