8

I have two DataFrames, something like this:

import pandas as pd

dates = pd.Index(['2016-10-03', '2016-10-04', '2016-10-05'], name='Date')

close = pd.DataFrame( {'AAPL': [112.52,  113., 113.05],
                       'CSCO': [  31.5, 31.35, 31.59 ],
                       'MSFT': [ 57.42, 57.24, 57.64 ] }, index = dates )

volume= pd.DataFrame( {'AAPL': [21701800, 29736800, 21453100] ,
                       'CSCO': [14070500, 18460400, 11808600] ,
                       'MSFT': [19189500, 20085900, 16726400] }, index = dates )

The output of DataFrame 'close' looks like this:

              AAPL   CSCO   MSFT
Date
2016-10-03  112.52  31.50  57.42
2016-10-04  113.00  31.35  57.24
2016-10-05  113.05  31.59  57.64

And the output of DataFrame 'volume' looks like this:

                AAPL      CSCO      MSFT
Date
2016-10-03  21701800  14070500  19189500
2016-10-04  29736800  18460400  20085900
2016-10-05  21453100  11808600  16726400

I would like to combine these two DataFrames into a single DataFrame with MultiIndex COLUMNS so that it looks like this:

              AAPL                CSCO                MSFT
             Close     Volume    Close     Volume    Close     Volume  
Date
2016-10-03  112.52   21701800    31.50   14070500    57.42   19189500
2016-10-04  113.00   29736800    31.35   18460400    57.24   20085900
2016-10-05  113.05   21453100    31.59   11808600    57.64   16726400

Can anyone give me an idea how to do that? I've been playing with pd.concat and pd.merge, but it's not clear to me how to get it to line up on the date index and allow me to provide names for the sub-index ('Close" and 'Volume') on the columns.

1 Answer 1

15

You can use the keys kwarg of concat:

In [11]: res = pd.concat([close, volume], axis=1, keys=["close", "volume"])

In [12]: res
Out[12]:
             close                  volume
              AAPL   CSCO   MSFT      AAPL      CSCO      MSFT
Date
2016-10-03  112.52  31.50  57.42  21701800  14070500  19189500
2016-10-04  113.00  31.35  57.24  29736800  18460400  20085900
2016-10-05  113.05  31.59  57.64  21453100  11808600  16726400

With a little rearrangement:

In [13]: res.swaplevel(0, 1, axis=1).sort_index(axis=1)
Out[13]:
              AAPL             CSCO             MSFT
             close    volume  close    volume  close    volume
Date
2016-10-03  112.52  21701800  31.50  14070500  57.42  19189500
2016-10-04  113.00  29736800  31.35  18460400  57.24  20085900
2016-10-05  113.05  21453100  31.59  11808600  57.64  16726400
Sign up to request clarification or add additional context in comments.

3 Comments

Excellent. Thanks!
Can you please clarify, trying to understand the kwarg 'keys' ... the documentation is a little sparse: "keys : sequence, default None If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level" It seems to me that it implies that there should be the same number of keys as there are objects that you are concatenating. Is that correct? And if so, how does that relate to the 'multilevel' comment in the documentation? Trying to better understand how kwarg 'keys' can be used in various use-cases. Thanks.
@DanielGoldfarb yes, they "become" the top level of the resulting MI. I think the docs suggest you can pass a list of (same sized) tuples and it could make a larger resulting MI. Usually keys will be a flat list (corresponding to concat objects). In this example you end up with a 2-level MI, but potentially you could end up with an (M+N)-level MI.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.