I have four pandas DataFrames (A, B, C, and D). A has a series of timestamps and a single column which refers to one of the other DataFrames:
A
Timestamp Source
----------- ------
2012-4-3 B
2013-12-20 C
2012-3-5 C
2014-12-7 D
2012-7-10 B
...
The other DataFrames hold more data:
B
Timestamp Foo Bar
----------- ---- ----
2012-1-1 1.5 1.3
2012-1-2 2.3 5.6
2012-1-3 3.4 3.3
...
2014-3-31 0.8 2.1
C
Timestamp Foo Bar
----------- ---- ----
2012-1-1 9.2 5.6
2012-1-2 4.8 7.6
2012-1-3 2.7 6.4
...
2014-3-31 7.0 6.5
D
Timestamp Foo Bar
----------- ---- ----
2012-1-1 6.8 4.2
2012-1-2 4.2 9.3
2012-1-3 5.5 0.7
...
2014-3-31 6.3 2.0
I want to construct a single DataFrame from A, B, C, and D that has three columns (Timestamp, Foo, and Bar) where the values of Foo and Bar come from the corresponding Timestamp in the DataFrame listed as the Source in A.
Not all Timestamps in A appear in the other three DataFrames, in which case, I'd like the values of Foo and Bar to be np.nan. Not all timestamps in B, C, and D appear in A, and simply won't appear in the final DataFrame.
My current approach is to loop through each row in A and return the values from the corresponding Source DataFrame:
srcs = {'B': B, 'C': C, 'D': D}
A['Foo'] = np.nan
A['Bar'] = np.nan
for i in range(len(A)):
ts = A.iloc[i].Timestamp
src = A.iloc[i].Source
A.iloc[i].Foo = srcs[src][srcs[src].Timestamp == ts].Foo
A.iloc[i].Bar = srcs[src][srcs[src].Timestamp == ts].Bar
There has to be a more efficient, more Pandithic(?) way to perform this action?