I'm looking to grab values from one dataframe and append them to another based on index - and to do it fast. Lets say we have the following 2 dataframes:
df has a bunch of random numbers with cities as the column headers
df2 is a dataframe full of cities with (unimportant) letters as the headers
In df2 at position df.loc[0,'c'] we have Los Angeles. I want to then go to df and find the Los Angeles column and take the value from the 0 index, and append it to df2 so that it reads Los Angeles : 544
import pandas as pd
import numpy as np
np.random.seed(42)
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
cities = ['Minneapolis', 'Boston', 'Chicago', 'San Francisco', 'Los Angeles', 'Bozeman', 'New York', 'Austin',
'Dallas', 'New Orleans']
df = pd.DataFrame({city:np.random.randint(1, 1000, 10000) for city in cities})
df = df.astype(str)
df2 = pd.DataFrame({k:[np.random.choice(cities) for i in range(10000)] for k in letters})
df
Minneapolis Boston Chicago San Francisco Los Angeles
0 103 720 444 335 544
1 436 833 813 252 418
2 861 140 428 919 339
3 271 792 83 804 801
4 107 546 765 825 320
df2
a b c d e
0 San Francisco Minneapolis Los Angeles Chicago Los Angeles
1 Minneapolis San Francisco Los Angeles Chicago San Francisco
2 Minneapolis Los Angeles Boston Chicago Boston
3 Chicago Chicago Chicago Boston Los Angeles
4 San Francisco San Francisco San Francisco Boston Los Angeles
Expected Result:
df3
a b c \
0 San Francisco : 335 Minneapolis : 103 Los Angeles : 544
1 Minneapolis : 436 San Francisco : 252 Los Angeles : 418
2 Minneapolis : 861 Los Angeles : 339 Boston : 140
3 Chicago : 83 Chicago : 83 Chicago : 83
4 San Francisco : 825 San Francisco : 825 San Francisco : 825
I solved this problem by doing a double for loop but it is taking too long since my dataframe is (280k, 260) and I have to do this function twice. I'm looking for a way in numpy to do this. Ideally if I can create a new numpy array where it has numbers as strings in the correct index locations, and I can use np.char.add(df2, df) to just add matrix-wise the words and numbers together. Just trying to figure out a quick way to index these values.
import time
s = time.time()
for col in df2.columns:
for i in range(0, len(df2)):
try:
df2.loc[i, col] += ' : ' + str(np.round(df.loc[i, df2.loc[i, col]], 2))
except:
pass
print(time.time()-s)
16.95679211616516
len(df) == len(df2? And you expectinglen(df3) == len(df)?