0

I would like to iterate the function using a while loop, but it iterates only once, and after that, the program just stands still.

from selenium import webdriver
from multiprocessing import Process


def browse(url):
    driver = webdriver.Chrome()
    driver.get(url)
    print(driver.page_source)
    driver.__exit__()

Pros = []
urls = open('urls.txt')

if __name__ == '__main__':
    while True:
        for url_item in urls:
            print(url_item)
            p = Process(target=browse, args=(url_item,))
            Pros.append(p)
            p.start()
        for t in Pros:
            t.join()
2
  • 1
    You are not reading anything from the urls.txt file you're opening, so I don't think the variable urls gets populated properly. Commented Jul 21, 2021 at 9:40
  • You're appending to Pros array in every iteration, but you're not clearing it before the next step. That might be the problem. Try setting it (Pros = []) inside the while loop. Commented Jul 21, 2021 at 9:43

1 Answer 1

1

The main problem relies on how the file's content was read as well as how the Process object has been started. It starting on a temporary Process object instead of the Process object in process list (Pros). In short, it is only a referencing issue and wrong way of utlizing File Object. Here is the working code.

from selenium import webdriver
from multiprocessing import Process


def browse(URL):
    driver = webdriver.Chrome()
    driver.get(URL)
    print(driver.page_source)
    driver.__exit__()

Pros = []
urls = open('urls.txt').readlines()

if __name__ == '__main__':
    while True:
        print("testing")
        for url_item in urls:
            print(url_item)
            p = Process(target=browse, args=(url_item,))
            Pros.append(p)
            Pros[-1].start()
        for t in Pros:
            t.join()
        Pros = []
Sign up to request clarification or add additional context in comments.

7 Comments

It's not working. If I use the above code, the multithreading will not work as intended. It runs the code one by one instead of running parallelly.
I just noticed, i have edited the answer, you can try it out again.
No, you can check from your side as well.
From my side, the process actually prints out the page_source fine. After edit, it prints all the page source simultaneously as well.
Can you elaborate more on what the error is about? Show the traceback from the console when the program is running.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.