2

I would like to split a column into multiple columns in my data frame. It is separated with commas.

I would like to apply something like 'text to columns' function in excel.

I will give my own headings after I split the columns. 'Turnstile' is the name of my column. I have:

(A006, R079, 00-00-04, 5 AVE-59 ST)

types of data in each row. In the end I would like to have:

A006    R079   00-00-04   5 AVE-59 ST

with the headings I will create.

I lastly tried:

df.Turnstile.str.split().tolist()

But all i have is 'nan'

When I check the type of 'Turnstile' column, it says 'object. I tried to convert that series into string with:

df['Turnstile'] = df[['Turnstile'].astype(str)]

but it gives me:

AttributeError: 'list' object has no attribute 'astype'

Please advise.

Thank you.

7
  • What do you get when you do type(df.Turnstile.values[0])? Commented Sep 27, 2015 at 22:45
  • it says tuple. @maxymoo Commented Sep 28, 2015 at 1:03
  • can you check the dtype of each of the tuple entries ? i.e. [type(df.Turnstile.values[0][i]) for i in range(4) Commented Sep 28, 2015 at 1:43
  • it says it is string: <type 'str'> @maxymoo Commented Sep 28, 2015 at 3:06
  • can you please post the results of df.head()? Commented Sep 28, 2015 at 5:39

3 Answers 3

3

Maybe another way of looking at this is converting a column of tuples to a DataFrame, like so:

In [10]: DataFrame(df['Turnstile'].tolist())
Out[10]:
      0     1         2            3
0  A006  R079  00-00-04  5 AVE-59 ST
1  A006  R079  00-00-04  5 AVE-59 ST
2  A006  R079  00-00-04  5 AVE-59 ST
3  A006  R079  00-00-04  5 AVE-59 ST
4  A006  R079  00-00-04  5 AVE-59 ST
5  A006  R079  00-00-04  5 AVE-59 ST
6  A006  R079  00-00-04  5 AVE-59 ST
7  A006  R079  00-00-04  5 AVE-59 ST
8  A006  R079  00-00-04  5 AVE-59 ST
9  A006  R079  00-00-04  5 AVE-59 ST

If that's the case, here's an example that converts the column of tuples to a DataFrame and adds it back to the original dataframe:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# create a fake dataframe, repeating the tuple given in the example
In [2]: df = DataFrame(data={'Observations': np.random.randn(10) * np.arange(10),
...:     'Turnstile': (('A006', 'R079', '00-00-04', '5 AVE-59 ST'),)*10})

In [3]: df.head()
Out[3]:
   Observations                            Turnstile
0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)
1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)
2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)
3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)
4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)

# all at once turn the column of tuples into a dataframe and concat that with the original df
In [4]: df = pd.concat([df,DataFrame(df['Turnstile'].tolist())], axis=1, join='outer')

In [5]: df.head()
Out[5]:
       Observations                            Turnstile     0     1         2  \
    0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04

         3
0  5 AVE-59 ST
1  5 AVE-59 ST
2  5 AVE-59 ST
3  5 AVE-59 ST
4  5 AVE-59 ST

# i assume you don't need this column anymore
In [6]: del df['Turnstile']

If that works you can of course name the new columns as needed.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @measureallthethings
@measureallthethings this is a much better answer than mine; i didn't realise that you can create a data frame from a list of tuples
0

Couple options here, if your data is in true csv format, say as an export from Excel, you can use pandas.read_csv to read in the file, and it will automatically be split into columns based on the column delimiters.

If your data is a string column with commas, you can use str.split to redefine your columns, but as far as I know, you need to dump the resulting column as a raw Python list and then recast as dataframe:

import pandas as pd
df = pd.DataFrame([["A006, R079, 00-00-04, 5 AVE-59 ST"]])
df2 = pd.DataFrame(df[0].str.split(',').tolist())

1 Comment

It gives me KeyError. Doesn't work. @maxymoo As I mentioned, I put ---> df.Turnstile.str.split().tolist() it gives me all 'nan'
0

try doing df.Turnstile.str.split(',')

1 Comment

When answering a question, please provide explanation associated with your code. Some people might not understand your code or don't see how it answers the question. See how to write a good answer

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.