1

Dears, I need to transform the Covid hospitalisation json data from the government webpage: https://onemocneni-aktualne.mzcr.cz/covid-19#panel3-hospitalization

I inspect the webpage and identified the table in the below-showed html code.

I used the following Python code and got the outcome below:

import bs4 as bs
import urllib.request
import json

source = urllib.request.urlopen("https://onemocneni-aktualne.mzcr.cz/covid-19#panel3-hospitalization")
soup = bs.BeautifulSoup(source)
js_test = soup.find("div", id="js-hospitalization-table-data")

#Convert to JSON object
jsonData = json.loads(js_test.attrs["data-table"])   
print (jsonData['body'])

Thank you.

11
  • What's a "data-table"? Commented Dec 21, 2020 at 18:10
  • i ´ve thought the .csv file ... Commented Dec 21, 2020 at 18:20
  • From the output you're getting, it looks like data-table is in JSON format, so you would need to convert data in that format into CSV — which may or may not be possible because the latter doesn't support nested data structures while the other does. Commented Dec 21, 2020 at 18:31
  • i tried to import this json data into the xls or Power BI, but without any success. so my idea was to extract the text behind "body": and transform the data in [ xxx ] into the .csv file .. any idea? thank you Commented Dec 21, 2020 at 18:59
  • Sorry, I don't know how to use beautifulsoup, but if you can get just the value of data-table from it or what it's returning, then I might be able to help convert it into CSV format. Commented Dec 21, 2020 at 19:36

1 Answer 1

1

The data you want is in JSON format, you can convert it to a Python dictionary (dict) and get the data under the body key using the built-in json module.

import json
import bs4 as bs
import urllib.request

source = urllib.request.urlopen(
    "https://onemocneni-aktualne.mzcr.cz/covid-19#panel3-hospitalization"
)
soup = bs.BeautifulSoup(source, "html.parser")

json_data = json.loads(
    soup.find("div", id="js-hospitalization-table-data")["data-table"]
)

print(type(json_data))
print(*json_data["body"])

Output (partial):

<class 'dict'>
['01.03.2020', 0, 0, 0, 0, 0] ['02.03.2020', 0, 0, 0, 0, 0] ... ['20.12.2020', 4398, 588, 0.1337, 34796, 0.7152]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much. However, I have not found out how to transform this dictionary into the table or dataframe. could you please add this for me? thank you
@Jara That's a different question. Please see How to convert JSON File to Dataframe. If you are still stuck, consider asking a new question here on Stackoverflow.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.