Convert txt file to dataframe using pandas

Question

I am trying to convert my txt file to pandas dataframe. First multiple lines are like this,

['Tue Sep 12 15:13:56 +0000 2017', 'text. ', 0, 'en', 390, 529, 7138, 15727, False, -84.395235, 33.771232]
['Tue Sep 12 15:13:59 +0000 2017', "text", 0, 'en', 648, 891, 2087, 5801, False, -84.321948, 33.752879]
['Tue Sep 12 15:14:01 +0000 2017', 'text', 0, 'en', 217, 222, 959, 958, False, -82.849182, 27.865251]
['Tue Sep 12 15:14:06 +0000 2017', 'text', 0, 'en', 71, 85, 2357, 1290, False, -82.29976, 27.857254]

Explanation for each element in each list is,

time, text, retweet_count, language, friends_count, followers_count, favourites_count, status_count, verified

I used pandas, but it does not work what I tend to.

df = pd.read_csv("second.txt", sep=',')

Then I have almost 100,000 columns, 0 rows. How can I convert this file to dataframe successfully? Thanks!

I could not understand how the txt is organized, but I would check the options on here — Filipe Lemos
– Filipe Lemos, Commented Nov 5, 2017 at 1:57
@FilipeLemos My bad. I edited. I mixed with my code and txt file together. — jayko03
– jayko03, Commented Nov 5, 2017 at 2:02
Why does your file have square brackets? Can you fix whatever generated that file to make a regular csv? — OneCricketeer
– OneCricketeer, Commented Nov 5, 2017 at 2:06

Andy Hayden · Accepted Answer · 2017-11-05 02:02:20Z

1

I would read in each line as a list and then pass to the DataFrame constructor:

In [11]: import ast

In [12]: pd.DataFrame([ast.literal_eval(line) for line in open("second.txt")])
Out[12]:
                               0       1   2   3    4    5     6      7      8          9          10
0  Tue Sep 12 15:13:56 +0000 2017  text.    0  en  390  529  7138  15727  False -84.395235  33.771232
1  Tue Sep 12 15:13:59 +0000 2017    text   0  en  648  891  2087   5801  False -84.321948  33.752879
2  Tue Sep 12 15:14:01 +0000 2017    text   0  en  217  222   959    958  False -82.849182  27.865251
3  Tue Sep 12 15:14:06 +0000 2017    text   0  en   71   85  2357   1290  False -82.299760  27.857254

literal_eval will convert the string to the corresponding python list:

In [21]: line = "['Tue Sep 12 15:13:56 +0000 2017', 'text. ', 0, 'en', 390, 529, 7138, 15727, False, -84.395235, 33.771232]"

In [22]: ast.literal_eval(line)
Out[22]:
['Tue Sep 12 15:13:56 +0000 2017',
 'text. ',
 0,
 'en',
 390,
 529,
 7138,
 15727,
 False,
 -84.395235,
 33.771232]

answered Nov 5, 2017 at 2:02

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Andy Hayden Over a year ago

problem is that it's not valid json, so you won't be able to read json. That said, it can be more performant to first do string manipulation then read json or read csv...

grovina Over a year ago

It would also be nice to have the argument names with the list (time, text, retweet_count...) in the dataframe constructor, so that you have named columns.

jayko03 Over a year ago

@AndyHayden yeah, it is my best so far. By the way I got this error. ValueError: malformed node or string: <_ast.Subscript object at 0x118535908> Should I have to convert all elements to str?

Andy Hayden Over a year ago

@jaykodeveloper i guess the question is how you get this "csv" in the first place, perhaps there's a better way to clean it (at source). As mentioned you can pass

columns=['time', 'text', 'retweet_count', 'language', 'friends_count', 'followers_count', 'favourites_count', 'status_count', 'verified']

to the DataFrame constructor.

jayko03 Over a year ago

@AndyHayden yeah. Tried to convert txt file to csv file but no luck. That's why I just try to get dataframe. Do you have any advice for converting?

|

jayko03 · Accepted Answer · 2017-11-05 03:11:34Z

1

I figured out this issue. I added \n once inner list inserted to outer list in python code. Then @AndyHayden solution works.

answered Nov 5, 2017 at 3:11

jayko03

2,4919 gold badges36 silver badges56 bronze badges

1 Comment

cs95 Over a year ago

If you used an answerer's solution to solve your problem, please don't forget to vote on it, and accept the answer.

Collectives™ on Stack Overflow

Convert txt file to dataframe using pandas

2 Answers 2

6 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Related