2

I am rather new to Pandas and am currently running into a problem when trying to insert a Dataframe inside a Dataframe.

What I want to do: I have multiple simulations and corresponding signal files and I want all of them in one big DataFrame. So I want a DataFrame which has all my simulation parameters and also my signals as an nested DataFrame. It should look something like this:

SimName | Date | Parameter 1 | Parameter 2 |  Signal 1 |  Signal 2 |
Name 1  | 123  | XYZ         | XYZ         | DataFrame | DataFrame |
Name 2  | 456  | XYZ         | XYZ         | DataFrame | DataFrame |

Where SimName is my Index for the big DataFrame and every entry in Signal 1 and Signal 2 is an individuall DataFrame.

My idea was to implement this like this:

big_DataFrame['Signal 1'].loc['Name 1']

But this results in an ValueError:

Incompatible indexer with DataFrame

Is it possible to have this nested DataFrames in Pandas?

Nico

4
  • You should show your initial data... Commented Oct 9, 2017 at 13:58
  • What do you mean with intitial data? For now I create the DataFrame with a list of all simulations as indixies and then add each simulation data after another Commented Oct 9, 2017 at 14:01
  • Why would you want to store a df in a df? Look into pandas panel. Commented Oct 9, 2017 at 14:02
  • @Parfait Panel is being deprecated Commented Dec 27, 2017 at 19:20

3 Answers 3

1

The 'pointers' referred to at the end of ns63sr's answer could be implemented as a class, e.g...

Definition:

class df_holder:
    def __init__(self, df): 
        self.df = df

Set:

df.loc[0,'df_holder'] = df_holder(df)

Get:

df.loc[0].df_holder.df
Sign up to request clarification or add additional context in comments.

Comments

0

the docs say that only Series can be within a DataFrame. However, passing DataFrames seems to work as well. Here is an exaple assuming that none of the columns is in MultiIndex:

import pandas as pd

signal_df = pd.DataFrame({'X': [1,2,3],
                          'Y': [10,20,30]}  )

big_df = pd.DataFrame({'SimName': ['Name 1','Name 2'],
                       'Date ':[123  , 456 ],
                       'Parameter 1':['XYZ', 'XYZ'],
                       'Parameter 2':['XYZ', 'XYZ'],
                       'Signal 1':[signal_df, signal_df],
                       'Signal 2':[signal_df, signal_df]}  )

big_df.loc[0,'Signal 1']
big_df.loc[0,'Signal 1'][X]

This results in:

out1:    X  Y
      0  1  10
      1  2  20
      2  3  30

out2: 0    1
      1    2
      2    3
      Name: X, dtype: int64

In case nested dataframes are not properly working, you may implement some sort of pointers that you store in big_df that allow you to access the signal dataframes stored elsewhere.

Comments

0

Instead of big_DataFrame['Signal 1'].loc['Name 1'] you should use

big_DataFrame.loc['Name 1','Signal 1']

1 Comment

While this might be syntactically more concise it doesn't avoid the issue.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.