0

I have a text file with data displayed like this:

{"created_at":"Mon Jun 02 00:04:00 +0000 2018","id":870430762953920,"id_str":"87043076220","text":"Hello there","source":"\u003ca href=\"http:\/\/tapbots.com\/software\/tweetbot\/mac\" rel=\"nofollow\"\u003eTweetbot for Mac\u003c\/a\u003e","truncated":false,"in_reply_to_status_id"}

The data is twitter posts and I have hundreds of these in one text file. I want to get the key value pair of "text":"Hello there" and turn that into it's own dataframe with a third column named target. I don't need any of the other columns. I'm doing some sensitivity analysis.

What would be the most pythonic way to go about this? I thought about using the df = pd.read_csv('test.txt', sep=r'"'), but then I don't know how to get rid of all the other columns i don't need and select the column with the text in it.

Any help would be much appreciated!

3
  • just a heads up there're some errors with the data. The value false will need to be capitalized and the last key doesn't have a value. This should raise errors when trying to process it. Commented Nov 26, 2019 at 18:57
  • You are aware that your text file is a JSON file? Do note: Pandas can read JSON files. Commented Nov 26, 2019 at 20:08
  • I did not! How can I create a dataframe with 2 columns from the JSON file? Commented Nov 26, 2019 at 21:13

1 Answer 1

1

I had to modify the lost two key/value pairs in your data to work. You may want to check if you're getting the data correctly or if you copy and pasted properly because you should be getting errors with the data as is displayed in your post.

"truncated":False,"in_reply_to_status_id":1

Then this worked well for me:

import pandas as pd

with open('test.txt','r') as inf1:   # reads the text file as code to evaluate
    d =eval(inf1.read())
index = range(len(d))
df = pd.DataFrame(d,index=index) # have to add index to because the entire df are scalar values
df = df.pop('text')
print(df)

Returns

0    Hello there
1    Hello there
2    Hello there
3    Hello there
4    Hello there
5    Hello there
6    Hello there
Name: text, dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

Is there a way to just ignore those two key:value pairs? I don't need them
Not that I know of. Reading in the data did not work from me using both read_csv and eval because of them.
If that's the case, I'd need to change these key:value pairs for hundreds of records. Could you suggest the best way?
Nothing immediately comes to mind. But that would require another post / round of research and testing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.