Timeline for Multi-Page Web Scraping Code Using Selenium with Multithreading
Current License: CC BY-SA 4.0
4 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Dec 27, 2024 at 15:07 | comment | added | Booboo |
@C.Nivs If a robots.txt file were present and had a Crawl-delay value, why not just use that? Sleeping unnecessarily is certainly not going to help performance. Also this sleeping is done for each invocation of scrap_blog_content after the URL is fetched. What if the URL is either the only one or the last one for a given website. This after-the-fact sleeping is totally unnecessary.
|
|
| Dec 27, 2024 at 13:56 | comment | added | C.Nivs |
time.sleep could be to comply with robots.txt/scraping rules and be generally more polite
|
|
| Dec 24, 2024 at 22:35 | history | edited | Booboo | CC BY-SA 4.0 |
Added performance-related question.
|
| Dec 24, 2024 at 19:21 | history | answered | Booboo | CC BY-SA 4.0 |