0

I've downloaded a big stream of twitter data in JSON format and saved it to a text file. I now want to read it in, line by line, and decode it into a dictionary using json.reads().

My only problem is that it throws an error on the first line, which I assume means the function doesn't think the data is JSON? I have added the line I want to decode at the bottom of this post. When I just print the lines the code works fine, its only the json.reads() function that throws an error.

Here is the code:

def decodeJSON(tweet_data): 
    for line in tweet_data:
        parsedJSON = json.loads(line)
        print(parsedJSON) # I just want to print for now to confirm it works.

Here is the error:

 File "/Users/cc756/Dropbox/PythonProjects/TwitterAnalysisAssignment/tweet_sentiment.py", line 17, in analyseSentiment
    parsedJSON = json.loads(line)   File "/Users/cc756/anaconda/envs/tensorflow/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)   File "/Users/cc756/anaconda/envs/tensorflow/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())   File "/Users/cc756/anaconda/envs/tensorflow/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Here is the first string:

'b\\'{"delete":{"status":{"id":805444624881811457,"id_str":"805444624881811457","user_id":196129140,"user_id_str":"196129140"},"timestamp_ms":"1500994305560"}}\\''

I feel like it should work, I've been staring at this for an hour with no improvement!

5
  • Is that exactly what the output of the first line looks like? json.loads isn't going to be able to parse that. Commented Jul 25, 2017 at 15:32
  • @CoryMadden b'{"delete":{"status":{"id":805444624881811457,"id_str":"805444624881811457","user_id":196129140,"user_id_str":"196129140"},"timestamp_ms":"1500994305560"}}' Commented Jul 25, 2017 at 15:41
  • Thats exactly what I pulled from the twitter feed and saved :s It seems to have changed slightly copying it from pycharm to the question but I just got that from the text file. Commented Jul 25, 2017 at 15:41
  • @ChrisCollins Don't put code/data in comments. Edit your question and format it accordingly to make it easier to read. Commented Jul 25, 2017 at 15:41
  • @ChrisCollins with that string I was able to use json.loads just fine. See my answer. Commented Jul 25, 2017 at 15:42

1 Answer 1

1

Your strings are in the wrong format. I'm not sure what you need to do to get rid of the 'b\\'(which doesn't really make sense) at the beginning, but manually typing it in to the shell gives me this:

In [119]: json.loads(b'{"delete":{"status":{"id":805444624881811457,"id_str":"80
     ...: 5444624881811457","user_id":196129140,"user_id_str":"196129140"},"time
     ...: stamp_ms":"1500994305560"}}')
Out[119]: 
{u'delete': {u'status': {u'id': 805444624881811457,
   u'id_str': u'805444624881811457',
   u'user_id': 196129140,
   u'user_id_str': u'196129140'},
  u'timestamp_ms': u'1500994305560'}}

Sorry, I'd make a comment, but imagine this post in a comment... :)

I'm not sure what's up with your pasting of the string into the question, but it's following an invalid format for Python, so you may want to correct that.

UPDATE: The issue was that the data was in binary format and just needed to be decoded with data.decode('utf-8')

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the reponse! So thats odd. You say using the string I posted in the comment to my original question json.loads() worked fine... Does that mean its loading into python incorrectly? I'm using open() to open the text file and then just using a for loop which iterates through each line.. confusing!
Yea, it could be the \\` if that's actually a part of the string that loads. I recommend printing the line before you try calling json.loads on to make sure that it is indeed a valid JSON line and it's not trying to pass an empty string to the function. You could get rid of that with line.replace("\\", "")
I'm getting somewhere if I delete the first and last 2 characters from the string.. Its odd that theyre there
It's probably just meant as a separator so you can open the entire file and split it on the slashes. For example lines = file.read(); tweet_data = lines.split('\\')
Awesome. I'm glad you got it. I updated my post to reflect that for the future people.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.