How to extract desired sections from a JSON string

Question

I want to know how to clean up my data to better understand it so that I can know how to sift through the data more easily. So far I have been able to download a public google spreadsheets doc and then convert that into a csv file. But when I print the data it is quite messy and hard to understand. The data came from a website, so when I go to google developer mode I can see how it is neatly organized.

Like this: Website data on inspect page mode

But actually seeing it as I print into in Jupyter notebooks it looks messy like this:

b'/O_o/\ngoogle.visualization.Query.setResponse({"version":"0.6","reqId":"0output=csv","status":"ok","sig":"1241529276","table":{"cols":[{"id":"A","label":"Entity","type":"string"},{"id":"B","label":"Week","type":"number","pattern":"General"},{"id":"C","label":"Day","type":"date","pattern":"yyyy-mm-dd"},{"id":"D","label":"Flights 2019 (Reference)","type":"number","pattern":"General"},{"id":"E","label":"Flights","type":"number","pattern":"General"},{"id":"F","label":"% vs 2019 (Daily)","type":"number","pattern":"General"},{"id":"G","label":"Flights (7-day moving average)","type":"number","pattern":"General"},{"id":"H","label":"% vs 2019 (7-day Moving Average)","type":"number","pattern":"General"},{"id":"I","label":"Day 2019","type":"date","pattern":"yyyy-mm-dd"},{"id":"J","label":"Day Previous Year","type":"date","pattern":"yyyy-mm-dd"},{"id":"K","label":"Flights Previous Year","type":"number","pattern":"General"}],"rows":[{"c":[{"v":"Albania"},{"v":36.0,"f":"36"},{"v":"Date(2020,8,1)","f":"2020-09-01"},{"v":129.0,"f":"129"},{"v":64.0,"f":"64"},{"v":-0.503875968992248,"f":"-0,503875969"},{"v":71.5714285714286,"f":"71,57142857"},{"v":-0.291371994342291,"f":"-0,2913719943"},{"v":"Date(2019,8,3)","f":"2019-09-03"},{"v":"Date(2019,8,3)","f":"2019-09-03"},{"v":129.0,"f":"129"}]},{"c":[{"v":"Albania"},{"v":36.0,"f":"36"},{"v":"Date(2020,8,2)","f":"2020-09-02"},{"v":92.0,"f":"92"},{"v":59.0,"f":"59"},{"v":-0.358695652173913,"f":"-0,3586956522"},{"v":70.0,"f":"70"},{"v":-0.300998573466476,"f":"-0,3009985735"},{"v":"Date(2019,8,4)","f":"2019-09-04"},{"v":"Date(2019,8,4)","f":"2019-09-04"},{"v":92.0,"f":"92"}]},{"c":[{"v":"Albania"},{"v":36.0,"f":"36"},{"v":"Date(2020,8,3)","f":"2020-09-03"},{"v":96.0,"f":"96"},{"v":67.0,"f":"67"},{"v":-0.302083333333333,"f":"-0,3020833333"},

Is there a Panda way to keep this data up?

Essentially what I am trying to do is extract three variables from the data: country, date, and a number.

Here it can be seen how the code starts out with the title, "rows":

Code in Jupyter showing how the code starts out

Essentially it gives a country, date, then a bunch of associated numbers.

What I want to get is the country name, a specific date, and a specific number.

For example, here is an example section, this sequence is repeated throughout the data:

{"c":[{"v":"Albania"},{"v":36.0,"f":"36"},{"v":"Date(2020,8,1)","f":"2020-09-01"},{"v":129.0,"f":"129"},{"v":64.0,"f":"64"},{"v":-0.503875968992248,"f":"-0,503875969"},{"v":71.5714285714286,"f":"71,57142857"},{"v":-0.291371994342291,"f":"-0,2913719943"},{"v":"Date(2019,8,3)","f":"2019-09-03"},{"v":"Date(2019,8,3)","f":"2019-09-03"},{"v":129.0,"f":"129"}]},

of this section of the data I only want to get out the word Country name: Albania, the date "2020-09-01", and the number -0.5038

Here is the code I used to grab the google spreadsheet data and save it as a csv:

import requests
import pandas as pd 

r = requests.get('https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0output=csv')

data = r.content

print(data)

Please any and all advice would be amazing.

Thank you

Trenton McKinney · Accepted Answer · 2021-01-27 22:23:15Z

1

I'm not sure how you arrived at this csv file, but the easiest way would be to get the json directly with requests, load it as a dict and process it. Nonetheless a solution for the current file would be:

import requests
import pandas as pd 
import json

r = requests.get('https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0output=jspn')

data = r.content
data = json.loads(data.decode('utf-8').split("(", 1)[1].rsplit(")", 1)[0]) # clean up the string so only the json data is left
d = [[i['c'][0]['v'], i['c'][2]['f'], i['c'][5]['v']] for i in data['table']['rows']]
df = pd.DataFrame(d, columns=['country', 'date', 'number'])

Output:
|    | country   | date       |        number |
|---:|:----------|:-----------|--------------:|
|  0 | Albania   | 2020-09-01 |     -0.503876 |
|  1 | Albania   | 2020-09-02 |     -0.358696 |
|  2 | Albania   | 2020-09-03 |     -0.302083 |
|  3 | Albania   | 2020-09-04 |     -0.135922 |
|  4 | Albania   | 2020-09-05 |     -0.43617  |

edited Jan 27, 2021 at 22:23

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

answered Jan 27, 2021 at 22:21

RJ Adriaansen

9,7092 gold badges16 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Trenton McKinney Over a year ago

You can also slice data to data = json.loads(data[47:-2])

JohnReese23487 Over a year ago

@RJ Adriaansen, Thank you! is there any sort of way to have it specifically pull out a countries name and then do grab its specific date and number? I need to extract out specific countries and their associated data points.

JohnReese23487 Over a year ago

@RJ Adriaansen, also here is the website I am scraping the data from: eurocontrol.int/Economics/DailyTrafficVariation-States.html. I go to inspect the page and go into the XHR and see where the GET requests are coming from. Not sure how to json it. Would love to know how to.

RJ Adriaansen Over a year ago

No now I see that the site itself loads it in this format from a google spreadsheet, so you can settle for my code. Filtering by country can be easily done in pandas: df[df['country'] == 'France']

JohnReese23487 Over a year ago

@RJ Adriaanse, I am sorry for my question I now see that pandas prints only the first 5. This is an amazing answer thank you so much

|

Collectives™ on Stack Overflow

How to extract desired sections from a JSON string

1 Answer 1

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Related