2

I am working with a Big valid JSON file. I am trying to parse this file using Pandas. When I try to read this file with Normal data = pd.read_json(filename) Method. It reads the file. But when I try to use the parameter lines=Truedata = pd.read_json(filename, lines=True) its throws an error ValueError: Expected object or value

I want to read this file using Chunks. But I get the same error If I use the parameter chunksize.

Can someone point out what I am doing wrong here.

filename='data/tinyTwitter.json'
data = pd.read_json(filename, lines=True, chunksize=100)

Data

{
   "total_rows":3877777,
   "offset":805584,
   "rows":[
      {
         "id":"570379215192727552",
         "key":[
            "r1r01cdn8nb4",
            2015,
            2,
            25
         ],
         "value":{
            "type":"Feature",
            "geometry":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "properties":{
               "created_at":"Wed Feb 25 00:26:16 +0000 2015",
               "text":"For the Oscars, Lady Gaga trained with a vocal coach DAILY for 6 months httmelbourne htto/ZSu8FifNUK",
               "location":"melbourne"
            }
         },
         "doc":{
            "_id":"570379215192727552",
            "_rev":"1-fa6a485cb4fe0575781b6c29286af554",
            "contributors":null,
            "truncated":false,
            "text":"For the Oscars, Lady Gaga trained with a vocal coach DAILY for 6 months htDIIS5EtsW9 #melbourne ho/ZSu8FifNUK",
            "in_reply_to_status_id":null,
            "favorite_count":0,
            "source":"",
            "retweeted":false,
            "coordinates":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "entities":{
               "symbols":[

               ],
               "user_mentions":[

               ],
               "hashtags":[
                  {
                     "indices":[
                        95,
                        105
                     ],
                     "text":"melbourne"
                  }
               ],
               "urls":[
                  {
                     "url":"",
                     "indices":[
                        72,
                        94
                     ],
                     "expanded_url":"",
                     "display_url":"j.mp/1ag2Quk"
                  }
               ],
               "media":[
                  {
                     "expanded_url":"",
                     "display_url":"pir.FifNUK",
                     "url":"http/ZSu8FifNUK",
                     "media_url_https":"",
                     "id_str":"570379215142457344",
                     "sizes":{
                        "large":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "small":{
                           "h":340,
                           "resize":"fit",
                           "w":340
                        },
                        "medium":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "thumb":{
                           "h":150,
                           "resize":"crop",
                           "w":150
                        }
                     },
                     "indices":[
                        106,
                        128
                     ],
                     "type":"photo",
                     "id":570379215142457340,
                     "media_url":""
                  }
               ]
            },
            "in_reply_to_screen_name":null,
            "in_reply_to_user_id":null,
            "retweet_count":0,
            "id_str":"570379215192727552",
            "favorited":false,
            "user":{
               "follow_request_sent":false,
               "profile_use_background_image":true,
               "profile_text_color":"333333",
               "default_profile_image":false,
               "id":2543131938,
               "profile_background_image_url_https":"",
               "verified":false,
               "profile_location":null,
               "profile_image_url_https":"",
               "profile_sidebar_fill_color":"DDEEF6",
               "entities":{
                  "url":{
                     "urls":[
                        {
                           "url":"",
                           "indices":[
                              0,
                              22
                           ],
                           "expanded_url":"",
                           "display_url":"youthsnews.com.au"
                        }
                     ]
                  },
                  "description":{
                     "urls":[

                     ]
                  }
               },
               "followers_count":68313,
               "profile_sidebar_border_color":"C0DEED",
               "id_str":"2543131938",
               "profile_background_color":"C0DEED",
               "listed_count":6,
               "is_translation_enabled":false,
               "utc_offset":36000,
               "statuses_count":1390,
               "description":"media network",
               "friends_count":788,
               "location":"pacific, oceania",
               "profile_link_color":"042A38",
               "profile_image_url":"",
               "following":false,
               "geo_enabled":true,
               "profile_banner_url":"h8",
               "profile_background_image_url":"htng",
               "name":"ynnmedia™",
               "lang":"en",
               "profile_background_tile":false,
               "favourites_count":765,
               "screen_name":"ynnmedianetwork",
               "notifications":false,
               "url":"htxq",
               "created_at":"Tue Jun 03 09:27:23 +0000 2014",
               "contributors_enabled":false,
               "time_zone":"Yakutsk",
               "protected":false,
               "default_profile":false,
               "is_translator":false
            },
            "geo":{
               "type":"Point",
               "coordinates":[
                  -37.95935781,
                  144.92340088
               ]
            },
            "in_reply_to_user_id_str":null,
            "possibly_sensitive":false,
            "lang":"en",
            "created_at":"Wed Feb 25 00:26:16 +0000 2015",
            "in_reply_to_status_id_str":null,
            "place":null,
            "metadata":{
               "iso_language_code":"en",
               "result_type":"recent"
            },
            "location":"melbourne"
         }
      },
      {
         "id":"570379220146200576",
         "key":[
            "r1r01cdn8nb4",
            2015,
            2,
            25
         ],
         "value":{
            "type":"Feature",
            "geometry":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "properties":{
               "created_at":"Wed Feb 25 00:26:17 +0000 2015",
               "text":"Abuses in AIB Roast were dubbed: Rakhi Sawant Ka",
               "location":"melbourne"
            }
         },
         "doc":{
            "_id":"570379220146200576",
            "_rev":"1-61252163c64f6f548cab2b8eb4cbd045",
            "contributors":null,
            "truncated":false,
            "text":"Abuses in AIB Roast were dubbed: Rakhi Sawant ourne htco/MbglBYEAKa",
            "in_reply_to_status_id":null,
            "favorite_count":0,
            "source":"t</a>",
            "retweeted":false,
            "coordinates":{
               "type":"Point",
               "coordinates":[
                  144.92340088,
                  -37.95935781
               ]
            },
            "entities":{
               "symbols":[

               ],
               "user_mentions":[

               ],
               "hashtags":[
                  {
                     "indices":[
                        69,
                        79
                     ],
                     "text":"melbourne"
                  }
               ],
               "urls":[
                  {
                     "url":"htKiAELeMO6",
                     "indices":[
                        46,
                        68
                     ],
                     "expanded_url":"/1ag2Omb",
                     "display_url":"j.mp/1ag2Omb"
                  }
               ],
               "media":[
                  {
                     "expanded_url":"h79220146200576/photo/1",
                     "display_url":"pglBYEAKa",
                     "url":"rr",
                     "media_url":"pk4O5UIAAI0l",
                     "id_str":"570379220049731584",
                     "sizes":{
                        "large":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "small":{
                           "h":340,
                           "resize":"fit",
                           "w":340
                        },
                        "medium":{
                           "h":380,
                           "resize":"fit",
                           "w":380
                        },
                        "thumb":{
                           "h":150,
                           "resize":"crop",
                           "w":150
                        }
                     },
                     "indices":[
                        80,
                        102
                     ],
                     "type":"photo",
                     "id":570379220049731600,
                     "media_urrl":"htpk4O5UIAAI0l1.jpg"
                  }
               ]
            },
            "in_reply_to_screen_name":null,
            "in_reply_to_user_id":null,
            "retweet_count":0,
            "id_str":"570379220146200576",
            "favorited":false,
            "user":{
               "follow_request_sent":false,
               "profile_use_background_image":true,
               "profile_text_color":"333333",
               "default_profile_image":false,
               "id":2543131938,
               "profile_background_image_url_https":"h/images/themes/theme1/bg.png",
               "verified":false,
               "profile_location":null,
               "profile_image_url_https":"htt/567602629937606657/ZCcCDFzr_normal.jpeg",
               "profile_sidebar_fill_color":"DDEEF6",
               "entities":{
                  "url":{
                     "urls":[
                        {
                           "url":"htAxq",
                           "indices":[
                              0,
                              22
                           ],
                           "expanded_url":"hws.com.au",
                           "display_url":"youth.au"
                        }
                     ]
                  },
                  "description":{
                     "urls":[

                     ]
                  }
               },
               "followers_count":68313,
               "profile_sidebar_border_color":"C0DEED",
               "id_str":"2543131938",
               "profile_background_color":"C0DEED",
               "listed_count":6,
               "is_translation_enabled":false,
               "utc_offset":36000,
               "statuses_count":1390,
               "description":"media network",
               "friends_count":788,
               "location":"pacific, oceania",
               "profile_link_color":"042A38",
               "profile_image_url":"htes/567602629937606657/ZCcCDFzr_normal.jpeg",
               "following":false,
               "geo_enabled":true,
               "profile_banner_url":"httpanners/2543131938/1424079798",
               "profile_background_image_url":"http/themes/theme1/bg.png",
               "name":"ynnmedia™",
               "lang":"en",
               "profile_background_tile":false,
               "favourites_count":765,
               "screen_name":"ynnmedianetwork",
               "notifications":false,
               "url":"httgeAxq",
               "created_at":"Tue Jun 03 09:27:23 +0000 2014",
               "contributors_enabled":false,
               "time_zone":"Yakutsk",
               "protected":false,
               "default_profile":false,
               "is_translator":false
            },
            "geo":{
               "type":"Point",
               "coordinates":[
                  -37.95935781,
                  144.92340088
               ]
            },
            "in_reply_to_user_id_str":null,
            "possibly_sensitive":false,
            "lang":"en",
            "created_at":"Wed Feb 25 00:26:17 +0000 2015",
            "in_reply_to_status_id_str":null,
            "place":null,
            "metadata":{
               "iso_language_code":"en",
               "result_type":"recent"
            },
            "location":"melbourne"
         }
      }
   ]
}
10
  • 1
    Are data confidental? If not, is possible share first 200 rows? Commented Apr 5, 2019 at 6:48
  • 2
    okay let me share it Commented Apr 5, 2019 at 7:07
  • In your input file, do you have one valid json record per line? Commented Apr 5, 2019 at 7:23
  • 1
    pandas.read_json only accepts json input in prespecified formats. See the valid formats in the documentation (look at the examples with different orient arguments). According to the documentation, if you select lines=True pandas.read_json expect one valid json per line. You get the error because your input does not adhere to this format. Commented Apr 5, 2019 at 7:36
  • 1
    @Sina Is there a way to change my JSON format in accordance with the JSON format such that I can use lines true properly? Commented Apr 5, 2019 at 7:45

1 Answer 1

1

I added the link above in the comments. But I believe the issue is the twitter response sends multiple json formats into 1 file, and doesn't break them up. The solution that worked was I took the whole file, and split them into a list. then just worked with each one individually.

import json

filename='data/tinyTwitter.json'

data = []
with open(filename) as json_file:  
    data_str = json_file.read()
    data_str = data_str.split('[',1)[-1]
    data_str = data_str.rsplit(']',1)[0]
    data_str = data_str.split('][')

for jsonStr in data_str:
    jsonStr = '[' + jsonStr + ']'

    temp_data = json.loads(jsonStr)
    for each in temp_data:
        data.append(each)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.