I have a text file with data displayed like this:
{"created_at":"Mon Jun 02 00:04:00 +0000 2018","id":870430762953920,"id_str":"87043076220","text":"Hello there","source":"\u003ca href=\"http:\/\/tapbots.com\/software\/tweetbot\/mac\" rel=\"nofollow\"\u003eTweetbot for Mac\u003c\/a\u003e","truncated":false,"in_reply_to_status_id"}
The data is twitter posts and I have hundreds of these in one text file. I want to get the key value pair of "text":"Hello there" and turn that into it's own dataframe with a third column named target. I don't need any of the other columns. I'm doing some sensitivity analysis.
What would be the most pythonic way to go about this? I thought about using the
df = pd.read_csv('test.txt', sep=r'"')
, but then I don't know how to get rid of all the other columns i don't need and select the column with the text in it.
Any help would be much appreciated!
false
will need to be capitalized and the last key doesn't have a value. This should raise errors when trying to process it.