Questions tagged [web-scraping]
Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.
609 questions
3
votes
1
answer
125
views
Multi-Page Web Scraping Code Using Selenium with Multithreading
I have written a web scraping script using Selenium to crawl blog content from multiple URLs. The script processes URLs in batches of 1000 and uses multithreading with the ThreadPoolExecutor to ...
5
votes
2
answers
697
views
Readability and error handling improvements for Python web scraping class
Description
I recently wrote a Python script to download files from the Library of Congress (LOC) based on a search query. The code fetches metadata, extracts file ...
4
votes
1
answer
100
views
Scraping the calendar of some public libraries from their websites
I've been learning some Haskell as an amateur (to be precise: I started programming with this language, and it has been a year or less since I started seriously). So far, I have realised only small ...
2
votes
1
answer
88
views
Scrapy Spider for fetching product data from multiple pages of a website
I have written a Scrapy spider to scrape product data from a website. The spider navigates through multiple pages to reach a specific product and extracts details such as the product name, price, ...
3
votes
2
answers
99
views
Validating a web crawlers page visits with a decorator
I am writing a crawler that is going to end up in production and I was trying to come up with a way to validate its page visits. It scrapes asp.net pages so each scraping process involves a few ...
5
votes
3
answers
839
views
code format and steps web scraping using beautiful soup
I've done simple web scraping and want to make sure all my steps are correct? Is it considered clean code? Is there a better way to use the multi-page scraping feature?
...
3
votes
1
answer
108
views
Scraping website with Python and Selenium to collect data from dynamic website
Summary:
The code scrapes the website and collects the data to store it in CSV. It also downloads selected information that is available for download in PDF format. The details and the entire code are ...
0
votes
2
answers
178
views
Drayage Webscraper: Limited to table structure
This is my first working scraper. I'm sure a lot can be improved. My biggest question is how can I better specify what data to pull? All the data I'm currently grabbing is needed, but I couldn't ...
2
votes
1
answer
78
views
A selenium web scraper to package NBA data
I'm building a selenium web scraper for basketball-reference.com that takes a player name and returns data in either a JSON format or Pandas DataFrame object. The class in question is one of many that ...
4
votes
1
answer
121
views
Java classes for downloading all in-coming/out-going links of an article in the Wikipedia article graph
(The entire project is in GitHub.)
Introduction
This project provides facilities for generating in-coming or out-going links in a given Wikipedia page.
Code
...
5
votes
1
answer
212
views
Scraping the Divar.ir
I've wrote a code to scrape the Divar, which is an equivalent of Ebay in Iran. I have a few questions:
Am I doing the error handling and logging ok?
Is there a better way to optimize this code? (note ...
1
vote
2
answers
202
views
Web scraping spider
I'm currently working on my first web scraping project and I need to scrape a lot of websites. With my current code it takes more than a day but for my project I need to scan the same websites every 5 ...
4
votes
2
answers
209
views
Enum to deserialize HTML sizes from JSON with serde
I added an enum for my webscraper to deserialize data from a JSON field that represents an HTML image size, which can either be an unsigned int like 1080 or a ...
2
votes
1
answer
108
views
Automatically extract useful cars from car site
I am using puppeteer to extract data and see when a car that meets my requirements shows up, this is what I did so far. I would like some basic syntax advice, or more advanced tips as well.
I tried to ...
2
votes
0
answers
74
views
Simplified HTML parsing for LEGO features
The goal is to extract the the Features section from a Lego product page. In the Features section, usually there's a header (...