1

I am trying to convert my txt file to pandas dataframe. First multiple lines are like this,

['Tue Sep 12 15:13:56 +0000 2017', 'text. ', 0, 'en', 390, 529, 7138, 15727, False, -84.395235, 33.771232]
['Tue Sep 12 15:13:59 +0000 2017', "text", 0, 'en', 648, 891, 2087, 5801, False, -84.321948, 33.752879]
['Tue Sep 12 15:14:01 +0000 2017', 'text', 0, 'en', 217, 222, 959, 958, False, -82.849182, 27.865251]
['Tue Sep 12 15:14:06 +0000 2017', 'text', 0, 'en', 71, 85, 2357, 1290, False, -82.29976, 27.857254]

Explanation for each element in each list is,

time, text, retweet_count, language, friends_count, followers_count, favourites_count, status_count, verified

I used pandas, but it does not work what I tend to.

df = pd.read_csv("second.txt", sep=',')

Then I have almost 100,000 columns, 0 rows. How can I convert this file to dataframe successfully? Thanks!

3
  • I could not understand how the txt is organized, but I would check the options on here Commented Nov 5, 2017 at 1:57
  • @FilipeLemos My bad. I edited. I mixed with my code and txt file together. Commented Nov 5, 2017 at 2:02
  • Why does your file have square brackets? Can you fix whatever generated that file to make a regular csv? Commented Nov 5, 2017 at 2:06

2 Answers 2

1

I would read in each line as a list and then pass to the DataFrame constructor:

In [11]: import ast

In [12]: pd.DataFrame([ast.literal_eval(line) for line in open("second.txt")])
Out[12]:
                               0       1   2   3    4    5     6      7      8          9          10
0  Tue Sep 12 15:13:56 +0000 2017  text.    0  en  390  529  7138  15727  False -84.395235  33.771232
1  Tue Sep 12 15:13:59 +0000 2017    text   0  en  648  891  2087   5801  False -84.321948  33.752879
2  Tue Sep 12 15:14:01 +0000 2017    text   0  en  217  222   959    958  False -82.849182  27.865251
3  Tue Sep 12 15:14:06 +0000 2017    text   0  en   71   85  2357   1290  False -82.299760  27.857254

literal_eval will convert the string to the corresponding python list:

In [21]: line = "['Tue Sep 12 15:13:56 +0000 2017', 'text. ', 0, 'en', 390, 529, 7138, 15727, False, -84.395235, 33.771232]"

In [22]: ast.literal_eval(line)
Out[22]:
['Tue Sep 12 15:13:56 +0000 2017',
 'text. ',
 0,
 'en',
 390,
 529,
 7138,
 15727,
 False,
 -84.395235,
 33.771232]
Sign up to request clarification or add additional context in comments.

6 Comments

problem is that it's not valid json, so you won't be able to read json. That said, it can be more performant to first do string manipulation then read json or read csv...
It would also be nice to have the argument names with the list (time, text, retweet_count...) in the dataframe constructor, so that you have named columns.
@AndyHayden yeah, it is my best so far. By the way I got this error. ValueError: malformed node or string: <_ast.Subscript object at 0x118535908> Should I have to convert all elements to str?
@jaykodeveloper i guess the question is how you get this "csv" in the first place, perhaps there's a better way to clean it (at source). As mentioned you can pass columns=['time', 'text', 'retweet_count', 'language', 'friends_count', 'followers_count', 'favourites_count', 'status_count', 'verified'] to the DataFrame constructor.
@AndyHayden yeah. Tried to convert txt file to csv file but no luck. That's why I just try to get dataframe. Do you have any advice for converting?
|
1

I figured out this issue. I added \n once inner list inserted to outer list in python code. Then @AndyHayden solution works.

1 Comment

If you used an answerer's solution to solve your problem, please don't forget to vote on it, and accept the answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.