Questions tagged [web-scraping]
Web scraping is the use of a program to simulate human interaction with a web server or to extract specific information from a web page.
                609 questions
            
            
            
                3
            
            votes
        
        
            
                1
            
            answer
        
        
            
                125
            
            views
        
        
            
            
        Multi-Page Web Scraping Code Using Selenium with Multithreading
                    I have written a web scraping script using Selenium to crawl blog content from multiple URLs. The script processes URLs in batches of 1000 and uses multithreading with the ThreadPoolExecutor to ...
                
            
       
        
            
                5
            
            votes
        
        
            
                2
            
            answers
        
        
            
                697
            
            views
        
        
            
        Readability and error handling improvements for Python web scraping class
                    Description
I recently wrote a Python script to download files from the Library of Congress (LOC) based on a search query. The code fetches metadata, extracts file ...
                
            
       
        
            
                4
            
            votes
        
        
            
                1
            
            answer
        
        
            
                101
            
            views
        
        
            
            
        Scraping the calendar of some public libraries from their websites
                    I've been learning some Haskell as an amateur (to be precise: I started programming with this language, and it has been a year or less since I started seriously). So far, I have realised only small ...
                
            
       
        
            
                2
            
            votes
        
        
            
                1
            
            answer
        
        
            
                90
            
            views
        
        
            
            
            
        Scrapy Spider for fetching product data from multiple pages of a website
                    I have written a Scrapy spider to scrape product data from a website. The spider navigates through multiple pages to reach a specific product and extracts details such as the product name, price, ...
                
            
       
        
            
                3
            
            votes
        
        
            
                2
            
            answers
        
        
            
                99
            
            views
        
        
            
            
        Validating a web crawlers page visits with a decorator
                    I am writing a crawler that is going to end up in production and I was trying to come up with a way to validate its page visits. It scrapes asp.net pages so each scraping process involves a few ...
                
            
       
        
            
                5
            
            votes
        
        
            
                3
            
            answers
        
        
            
                841
            
            views
        
        
            
            
            
        code format and steps web scraping using beautiful soup
                    I've done simple web scraping and want to make sure all my steps are correct? Is it considered clean code? Is there a better way to use the multi-page scraping feature?
...
                
            
       
        
            
                3
            
            votes
        
        
            
                1
            
            answer
        
        
            
                109
            
            views
        
        
            
            
            
        Scraping website with Python and Selenium to collect data from dynamic website
                    Summary:
The code scrapes the website and collects the data to store it in CSV. It also downloads selected information that is available for download in PDF format. The details and the entire code are ...
                
            
       
        
            
                0
            
            votes
        
        
            
                2
            
            answers
        
        
            
                179
            
            views
        
        
            
            
            
        Drayage Webscraper: Limited to table structure
                    This is my first working scraper.  I'm sure a lot can be improved. My biggest question is how can I better specify what data to pull? All the data I'm currently grabbing is needed, but I couldn't ...
                
            
       
        
            
                2
            
            votes
        
        
            
                1
            
            answer
        
        
            
                80
            
            views
        
        
            
            
            
        A selenium web scraper to package NBA data
                    I'm building a selenium web scraper for basketball-reference.com that takes a player name and returns data in either a JSON format or Pandas DataFrame object. The class in question is one of many that ...
                
            
       
        
            
                4
            
            votes
        
        
            
                1
            
            answer
        
        
            
                127
            
            views
        
        
            
            
        Java classes for downloading all in-coming/out-going links of an article in the Wikipedia article graph
                    (The entire project is in GitHub.)
Introduction
This project provides facilities for generating in-coming or out-going links in a given Wikipedia page.
Code
...
                
            
       
        
            
                5
            
            votes
        
        
            
                1
            
            answer
        
        
            
                214
            
            views
        
        
            
        Scraping the Divar.ir
                    I've wrote a code to scrape the Divar, which is an equivalent of Ebay in Iran. I have a few questions:
Am I doing the error handling and logging ok?
Is there a better way to optimize this code? (note ...
                
            
       
        
            
                1
            
            vote
        
        
            
                2
            
            answers
        
        
            
                203
            
            views
        
        
            
            
            
        Web scraping spider
                    I'm currently working on my first web scraping project and I need to scrape a lot of websites. With my current code it takes more than a day but for my project I need to scan the same websites every 5 ...
                
            
       
        
            
                4
            
            votes
        
        
            
                2
            
            answers
        
        
            
                209
            
            views
        
        
            
        Enum to deserialize HTML sizes from JSON with serde
                    I added an enum for my webscraper to deserialize data from a JSON field that represents an HTML image size, which can either be an unsigned int like 1080 or a ...
                
            
       
        
            
                2
            
            votes
        
        
            
                1
            
            answer
        
        
            
                108
            
            views
        
        
            
            
            
        Automatically extract useful cars from car site
                    I am using puppeteer to extract data and see when a car that meets my requirements shows up, this is what I did so far. I would like some basic syntax advice, or more advanced tips as well.
I tried to ...
                
            
       
        
            
                2
            
            votes
        
        
            
                0
            
            answers
        
        
            
                76
            
            views
        
        
            
            
        Simplified HTML parsing for LEGO features
                    The goal is to extract the  the Features section from a Lego product page. In the Features section, usually there's a header (...
                
            
       
         
        