Pandas: Storing Dataframe in Dataframe

Question

I am rather new to Pandas and am currently running into a problem when trying to insert a Dataframe inside a Dataframe.

What I want to do: I have multiple simulations and corresponding signal files and I want all of them in one big DataFrame. So I want a DataFrame which has all my simulation parameters and also my signals as an nested DataFrame. It should look something like this:

SimName | Date | Parameter 1 | Parameter 2 |  Signal 1 |  Signal 2 |
Name 1  | 123  | XYZ         | XYZ         | DataFrame | DataFrame |
Name 2  | 456  | XYZ         | XYZ         | DataFrame | DataFrame |

Where SimName is my Index for the big DataFrame and every entry in Signal 1 and Signal 2 is an individuall DataFrame.

My idea was to implement this like this:

big_DataFrame['Signal 1'].loc['Name 1']

But this results in an ValueError:

Incompatible indexer with DataFrame

Is it possible to have this nested DataFrames in Pandas?

Nico

What do you mean with intitial data? For now I create the DataFrame with a list of all simulations as indixies and then add each simulation data after another — Nico Hertel
– Nico Hertel, Commented Oct 9, 2017 at 14:01
Why would you want to store a df in a df? Look into pandas panel. — Parfait
– Parfait, Commented Oct 9, 2017 at 14:02

LW001 · Accepted Answer · 2017-12-29 06:52:59Z

1

The 'pointers' referred to at the end of ns63sr's answer could be implemented as a class, e.g...

Definition:

class df_holder:
    def __init__(self, df): 
        self.df = df

Set:

df.loc[0,'df_holder'] = df_holder(df)

Get:

df.loc[0].df_holder.df

edited Dec 29, 2017 at 6:52

LW001

3,0117 gold badges35 silver badges43 bronze badges

answered Dec 27, 2017 at 19:45

rbinnun

1,2292 gold badges11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ns63sr · Accepted Answer · 2017-10-09 14:46:02Z

the docs say that only Series can be within a DataFrame. However, passing DataFrames seems to work as well. Here is an exaple assuming that none of the columns is in MultiIndex:

import pandas as pd

signal_df = pd.DataFrame({'X': [1,2,3],
                          'Y': [10,20,30]}  )

big_df = pd.DataFrame({'SimName': ['Name 1','Name 2'],
                       'Date ':[123  , 456 ],
                       'Parameter 1':['XYZ', 'XYZ'],
                       'Parameter 2':['XYZ', 'XYZ'],
                       'Signal 1':[signal_df, signal_df],
                       'Signal 2':[signal_df, signal_df]}  )

big_df.loc[0,'Signal 1']
big_df.loc[0,'Signal 1'][X]

This results in:

out1:    X  Y
      0  1  10
      1  2  20
      2  3  30

out2: 0    1
      1    2
      2    3
      Name: X, dtype: int64

In case nested dataframes are not properly working, you may implement some sort of pointers that you store in big_df that allow you to access the signal dataframes stored elsewhere.

Ioannis Nasios · Accepted Answer · 2017-10-09 15:00:51Z

0

Instead of big_DataFrame['Signal 1'].loc['Name 1'] you should use

big_DataFrame.loc['Name 1','Signal 1']

answered Oct 9, 2017 at 15:00

Ioannis Nasios

8,5474 gold badges41 silver badges59 bronze badges

1 Comment

rbinnun Over a year ago

While this might be syntactically more concise it doesn't avoid the issue.

Collectives™ on Stack Overflow

Pandas: Storing Dataframe in Dataframe

3 Answers 3

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Linked

Related