1

I want to store a DataFrame object as a value of the column of a row: Here's a simplified analogy of what I want to achieve.

>>> df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))
>>> df    
166:    D  E  F
     0  1  2  3
     1  2  4  6

I created a new DataFrame and add a new column on the go as I insert the new DataFrame object as a value of the new column. Please refer to the code.

>>> df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))
>>> df.loc[df['F'] == 6, 'G'] = df_in_df
>>> df
   D  E  F   G
0  1  2  3 NaN
1  2  4  6 NaN
>>> df.loc[df['F'] == 6, 'G'].item()
    nan
>>> # But the below works fine, i.e. when I insert an integer
>>> df.loc[df['F'] == 6, 'G'] = 4
>>> df
>>>   D  E  F    G
   0  1  2  3  NaN
   1  2  4  6  4.0
>>> # and to verify 
>>> df.loc[df['F'] == 6, 'G'].item()
    4.0

BTW I have managed to find a workaround over this by pickling the DataFrame into a string but I don't feel any good about it:

df.loc[df['F'] == 6, 'G'] = pickle.dumps(df_in_df)
>>> df
187:    D  E  F                                                  G
     0  1  2  3                                                NaN
     1  2  4  6  ccopy_reg\n_reconstructor\np0\n(cpandas.core.f...

>>> revive_df_from_df = pickle.loads(df.loc[df['F'] == 6, 'G'].item())
>>> revive_df_from_df
191:     X   Y   Z
     0  11  13  17
     1  19  23  31

I started using pandas today itself after referring through pandas in 10 mins, So I don't know the conventions, Any better ideas ? Thanks!

3
  • It's difficult to understand what are you going to achieve - are you talking about panels? Commented Jun 21, 2016 at 16:44
  • I want to insert a DataFrame object to the column of a particular row. Commented Jun 21, 2016 at 16:47
  • And why would you want to do that? Pandas is suppose to be a fast table query framework. Commented Jun 21, 2016 at 17:08

3 Answers 3

1

You are on shaky ground relying on this behavior. pandas does a lot of work trying to infer what you mean or want when passing array like things to its constructors and assignment functions. This is pressing on those boundaries, seemingly intentionally.

It seems that direct assignment via loc doesn't work. This is a work around I've found. Again, I would not expect this behavior to be robust over pandas versions.

df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))

df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))

df.loc[df['F'] == 6, 'G'] = np.nan
df.loc[df['F'] == 6, 'G'] = df.loc[df['F'] == 6, ['G']].applymap(lambda x: df_in_df)

df

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Why, is it kind of wrong by convention to insert a DF into another DF ?
Because on init pandas called Numpy -- created array.. now its getting sequence. @wolframalpha.. Your use case is not what pandas was designed for.
I'm not an authority on the issue. But I'd say yes. Not wrong. But wrong by convention (I'm guessing what this means). The advantages pandas provides comes in many forms including its inference. Placing a general object inside a dataframe shouldn't be an issue. Expecting this code to continue functioning in future versions is. I'd guess the devs might very well change how this works in an attempt to better infer what people might mean when they attempt such a thing. Placing a high dimensional structure in a high dimensional structure is better handled with MultiIndex.
@piRSquared okay that's fine, but isn't it cool to map a single row into multiple rows of another DF (as in DBs we use junction tables), I am asking you this since then next time I won't be using this thing with pandas!
1

Create a Dict first:

x = pd.DataFrame()

y =  {'a':[5,4,5],'b':[6,9,7], 'c':[7,3,x]}

# {'a': [5, 4, 5], 'b': [6, 9, 7], 'c': [7, 3, Empty DataFrame
#   Columns: []
#   Index: []]}

z = pd.DataFrame(y)

#   a  b                                      c
# 0  5  6                                      7
# 1  4  9                                      3
# 2  5  7  Empty DataFrame
# Columns: []
# Index: []
# In [ ]:

(or, convert the DataFrame to dict and try to insert it. There is a lot happening ,when pandas creates objects.. You are torturing pandas. Your use case implies nested dicts, I would use that. )

4 Comments

Yes, right, Thanks but I want to create a new column and then insert a DataFrame into the row! Any idea?
Yes that would be better indeed!
Are you planning to access inserted DF via pandas methods. It likely will not work. Use a linked list -- or dict and pandas or just use Sqlite-- torturing pandas this way will lead to future rewrites
That wont happen! Use a dict!
1

First create the column where you want to insert the dictionary. Then convert your dictionary to a string using the repr function. Then insert the string dictionary to your column. If you want to query that string. First select it and then use eval(dict) to convert to dictionary again and use.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.