0

I am using python to scrape a url such as in the code blow

import requests
from bs4 import BeautifulSoup
import json

n_index = 10
base_link = 'http://xxx.xxx./getinfo?range=10&district_id=1&index='
for i in range (1,n_index+1):
    link = base_link+str(i)
    r = requests.get(link)
    pid = r.json()
    print (pid)

it's return ten result just like this blow

{'product_info': [{'pid': '1', 'product_type': '2'}]}
{'product_info': [{'pid': '2', 'product_type': '2'}]}
{'product_info': [{'pid': '3', 'product_type': '2'}]}
{'product_info': [{'pid': '4', 'product_type': '2'}]}
{'product_info': [{'pid': '5', 'product_type': '2'}]}
{'product_info': [{'pid': '6', 'product_type': '2'}]}
{'product_info': [{'pid': '7', 'product_type': '2'}]}
{'product_info': [{'pid': '8', 'product_type': '2'}]}
{'product_info': [{'pid': '9', 'product_type': '2'}]}
{'product_info': [{'pid': '10', 'product_type': '2'}]}

and then i want to save the resulting 10 lines into a json file, as presented in the code below:

with open('sylist.json', 'w') as outfile:
    json.dump(r.json(), outfile, indent=4)

but only one result is saved into the json file local, who can help me to resolve,thanks a lot

1
  • Use append instead of write ::- with open('sylist.json', 'a') as outfile: Commented Jan 9, 2018 at 5:52

2 Answers 2

2

On a typical way, try below way to write result line by line without open/close file at each time.

with open('sylist.json', 'a+') as outfile:
    for i in range (1,n_index+1):
        link = base_link+str(i)
        r = requests.get(link)
        outfile.write("{}\n".format(json.dump(r.json(), outfile, indent=4)))
Sign up to request clarification or add additional context in comments.

1 Comment

@slackware hey, if it worked for you, please accept it:)
0

Let me extend Frank's answer a bit. You are sending the request inside the for loop, which means at every iteration of the loop, the value of pid is overwritten. As a result, when you want to dump its content to an output file, pid holds only the contents from the very last iteration/request. I would suggest to apply one of the following to address your issue:

  1. Include writing component inside the for loop (or vice-versa, as suggested in the answer by Frank AK).
  2. Instead of overwriting the content of pid each time, you may append it directly inside the for loop as follows:

    my_list = []
    for i in range (1,n_index+1):
        link = base_link+str(i)
        r = requests.get(link)
        pid = r.json()
        my_list.append(pid)
    
    with open('sylist.json', 'w') as outfile:
        json.dump(my_list, outfile, indent=4)
    

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.