4

Aim

I am trying to manipulate data from some video tracking experiments using python pandas. I placed a number of point markers on a structure, and tracked the points' XY coordinates over time. Together these data describe the shape of the structure over the course of the test. I am having trouble arranging my data into a hierarchical/nested DataFrame object.

Importing the data

My tracking method outputs each point's X,Y coordinates (and time) for each frame of video. This data is stored in csv files with a column for each variable, and a row for each video frame:

t,x,y
0.000000000E0,-4.866015168E2,-2.116143012E0
1.000000000E-1,-4.866045511E2,-2.123012558E0
2.000000000E-1,-4.866092436E2,-2.129722560E0

using pandas.read_csv I am able to read these csv files into DataFrames, with the same columns/rows format:

In [1]: pd.read_csv(point_a.csv)
Out[17]: 
     t           x         y
0  0.0 -486.601517 -2.116143
1  0.1 -486.604551 -2.123013
2  0.2 -486.609244 -2.129723

No problem so far.

Creating a hierarchical structure

I would like to merge several of the above DataFrames (one for each point), and create a large DataFrame with hierarchical columns, where all variables share one index (video frames). See the below columns point_a, point_b etc, with subcolumns for x, y, t. The shape column represents useful vectors for plotting the shape of the structure.

        |   point_a     |   point_b     |   point_c     |   shape
frames  |   x   y   t   |   x   y   t   |   x   y   t   |   x               y
-----------------------------------------------------------------------------------
0       |   xa0 ya0 ta0 |   xb0 yb0 tb0 |   xc0 yc0 tc0 |   [xa0,xb0,xc0]   [ya0,yb0,yc0]
1       |   xa1 ya1 ta1 |   xb1 yb1 tb1 |   xc1 yc1 tc1 |   [xa1,xb1,xc1]   [ya1,yb1,yc1]
2       |   xa2 ya2 ta2 |   xb2 yb2 tb2 |   xc2 yc2 tc2 |   [xa2,xb2,xc2]   [ya2,yb2,yc2]
3       |   xa3 ya3 ta3 |   xb3 yb3 tb3 |   xc3 yc3 tc3 |   [xa3,xb3,xc3]   [ya3,yb3,yc3]

I would like to specify a video frame, and be able to grab a variable's value for that frame, e.g. df[1].point_b.y = yb1

What I have tried so far

Nested dicts as input

My previous approach to handling this kind of thing is to use nested dicts:

nested_dicts = {
    "point_a": {
        "x": [xa0, xa1, xa2], 
        "y": [ya0, ya1, ya2], 
        "t": [ta0, ta1, ta2],
        },
    "point_b": {
        "x": [xb0, xb1, xb2], 
        "y": [yb0, yb1, yb2], 
        "t": [tb0, tb1, tb2],
        },
    "point_c": {
        "x": [xc0, xc1, xc2], 
        "y": [yc0, yc1, yc2], 
        "t": [tc0, tc1, tc2],
        },
    }

This does everything I need except for slicing the data by frame number. When I try to use this nested dict as an input to a DataFrame, I get the following:

In [1]: pd.DataFrame(nested_dicts)
Out[2]:
           point_a          point_b          point_c
t  [ta0, ta1, ta2]  [tb0, tb1, tb2]  [tc0, tc1, tc2]
x  [xa0, xa1, xa2]  [xb0, xb1, xb2]  [xc0, xc1, xc2]
y  [ya0, ya1, ya2]  [yb0, yb1, yb2]  [yc0, yc1, yc2]

Problem: there is no shared frames index. The DataFrame has taken t,x,y as the index.

Specifying an index for nested dict input

If I try to specify an index:

In [1]: pd.DataFrame(nested_dicts, index=range(number_of_frames)) 

Then I get a DataFrame with the correct number of rows, but no subcolumns, and full of NaNs:

Out[2]:
    point_a point_b point_c
0   NaN     NaN     NaN    
1   NaN     NaN     NaN  
2   NaN     NaN     NaN  
3   NaN     NaN     NaN  
4   NaN     NaN     NaN  
5   NaN     NaN     NaN  
6   NaN     NaN     NaN  
7   NaN     NaN     NaN  
8   NaN     NaN     NaN 

Adding each DataFrame individually

If I create a DataFrame for each point:

point_a =               point_b =
    t    x    y             t    x    y
0   ta0  xa0  ya0       0   tb0  xb0  yb0
1   ta1  xa1  ya1       1   tb1  xb1  yb1
2   ta2  xa2  ya2       2   tb2  xb2  yb2

and pass these to a DataFrame, indicating the index to be shared, as follows:

In [1]: pd.DataFrame({"point_a":point_a,"point_b":point_b},index=point_a.index)

then I get the following, which just contains x,y,t as strings:

Out[2]:
    point_a point_b
0   (t,)    (t,)
1   (x,)    (x,)
2   (y,)    (y,)

1 Answer 1

5

I think you can use dict comprehension with concat and then reshape DataFrame by stack and unstack:

df = pd.concat({key:pd.DataFrame(nested_dicts[key]) for key in nested_dicts.keys()})
       .stack()
       .unstack([0,2])

print (df)
  point_a           point_b           point_c          
        t    x    y       t    x    y       t    x    y
0     ta0  xa0  ya0     tb0  xb0  yb0     tc0  xc0  yc0
1     ta1  xa1  ya1     tb1  xb1  yb1     tc1  xc1  yc1
2     ta2  xa2  ya2     tb2  xb2  yb2     tc2  xc2  yc2

Another solution with swaplevel and sort first level in MultiIndex in columns by sort_index:

df = pd.concat({key:pd.DataFrame(nested_dicts[key]) for key in nested_dicts.keys()})
       .unstack(0)

df.columns = df.columns.swaplevel(0,1)
df = df.sort_index(level=0, axis=1)
print (df)
  point_a           point_b           point_c          
        t    x    y       t    x    y       t    x    y
0     ta0  xa0  ya0     tb0  xb0  yb0     tc0  xc0  yc0
1     ta1  xa1  ya1     tb1  xb1  yb1     tc1  xc1  yc1
2     ta2  xa2  ya2     tb2  xb2  yb2     tc2  xc2  yc2

Or you can use Panel with transpose and to_frame:

df = pd.Panel(nested_dicts).transpose(0,1,2).to_frame().unstack()
print (df)
      point_a           point_b           point_c          
minor       t    x    y       t    x    y       t    x    y
major                                                      
0         ta0  xa0  ya0     tb0  xb0  yb0     tc0  xc0  yc0
1         ta1  xa1  ya1     tb1  xb1  yb1     tc1  xc1  yc1
2         ta2  xa2  ya2     tb2  xb2  yb2     tc2  xc2  yc2
Sign up to request clarification or add additional context in comments.

1 Comment

This appears to do the trick, and I am able to grab frame ii using df.values[ii]. Thanks! Now I just need to figure out how it works...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.