Tag Info

Hot answers tagged scrapy

7 votes

Accepted

Locating contact information from a CSV spreadsheet

You would do well to use generator functions to break this down into parts that might be re-usable for other steps in your pipeline. For example, the first thing you do in your loop is read a row ...

aghast

12.6k

answered Mar 18, 2018 at 15:55

5 votes

Accepted

small web scraper to read product highlight from given urls

Splitting lines Don't call splitlines() here: ...

Reinderien

71.1k

answered Jul 15, 2020 at 2:47

5 votes

Accepted

Parsing a slow-loading webpage with scrapy in combination with selenium

The spider is readable and understandable. I would only extract some of the things into separate methods for readability. For example, the "infinite scroll" should probably be just defined in a ...

alecxe

17.5k

answered Sep 25, 2017 at 4:30

4 votes

Accepted

Parsing an XML tree of categories

You can start by just extracting the bit that clearly is repeated into a standalone function: ...

ades

1,391

answered Aug 31, 2022 at 11:17

4 votes

Accepted

Scraping a dynamic website with Scrapy (or Requests) and Selenium

the search field for this site is dynamically generated That doesn't matter, since - if you bypass the UI - the form field name itself is not dynamic. Even if you were to keep using Selenium, it ...

Reinderien

71.1k

answered Aug 3, 2021 at 16:45

4 votes

Accepted

Webscraping from yellowbook

First: a general observation - simple nested if statements are equivalent to a single if with statements joined by ...

match

answered Mar 9, 2018 at 15:15

4 votes

Accepted

Scraping a webpage copying with the logic of scrapy

Here are some of the things I would improve in the code: put each import on a separate line, there is not much point in saving space in this case: ...

alecxe

17.5k

answered Aug 31, 2017 at 14:16

4 votes

Scraping a webpage copying with the logic of scrapy

Even though you are trying to mimic what Scrapy spider might look like, there is a very major high-level difference between how your code is executed and how a Scrapy spider is. Scrapy is entirely ...

alecxe

17.5k

answered Aug 31, 2017 at 14:08

3 votes

Accepted

Web Scraping Dynamically Generated Content Python

Combined imports ...

Reinderien

71.1k

answered May 26, 2020 at 20:33

3 votes

Writing to a csv file in a customized way using scrapy

I'd like to mention, that there is a special way of making output files in scrapy - item pipelines. So, in order to make it right, you should write your own pipeline (or modify standard one via ...

Dmitry Arkhipenko

answered Jun 29, 2018 at 20:28

2 votes

Scrapy Spider for fetching product data from multiple pages of a website

Overall, it looks great. deprecated class Doing $ scrapy runspider *.py reveals that the OP code chooses to use the 2.6 ...

J_H

42.1k

answered Jun 28, 2024 at 0:04

2 votes

Accepted

Save data in item wihitn Scrapy and Python

Have you tried using a collections.namedtuple or a dataclasses.dataclass rather than a dict? ...

DeathIncarnate

1,735

answered Apr 25, 2023 at 20:48

2 votes

Crawling thesaurus for a synonym

I don't see $this->syn_src_limit or $path changing anywhere in your code. For this reason, these are good candidates for ...

mickmackusa

8,802

answered Nov 5, 2021 at 12:55

2 votes

Crawling thesaurus for a synonym

Base Review It may not work on every server, depending on its OS (I'm running Windows 10) One thing to make it more portable would be to use the Predefined Constant ...

Sᴀᴍ Onᴇᴌᴀ♦

29.6k

answered Nov 3, 2021 at 13:56

2 votes

Website parser for advertisements

While this answer is a long time after the question was asked, it seems like a solid question, so I'll give answering it a shot. Small things I'll start off with a few small points that aren't all ...

spyr03

3,052

answered Jul 21, 2020 at 14:21

2 votes

Creating a csv file using scrapy

By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are ...

alecxe

17.5k

answered Dec 16, 2018 at 15:30

1 vote

Accepted

Sourcing data format from multiple different structures

Don't abuse inner lists This: self.name = ' '.join([data.get('name').get(key) for key in ['first_name', 'last_name']]) should be ...

Reinderien

71.1k

answered Jan 7, 2019 at 16:08

1 vote

Creating a csv file using scrapy

Although I'm not an expert on this, I thought to come up with a solution which I've been following quite some time. Making use of signals might be a wise attempt ...

SIM

2,501

answered Dec 17, 2018 at 12:52

1 vote

Accepted

Writing to a csv file in a customized way using scrapy

You should opt for closed() method as I've tried below. This method will be called automatically once your spider is closed. This method provides a shortcut to signals.connect() for the spider_closed ...

SIM

2,501

answered Jun 30, 2018 at 13:02

1 vote

Writing to a csv file in a customized way using scrapy

You should ensure that the file is closed. In addition you should avoid creating a new writer object every loop iteration using the with statement: ...

Graipher

41.7k

answered Jun 30, 2018 at 9:18

1 vote

Parsing different categories using Scrapy from a webpage

this scraper is very repetitive Yeah, I get that motivation. I'm glad that's a thing you're worried about. Offhand? No, I don't see any obvious improvements to the code. But I will observe that bs4 ...

J_H

42.1k

answered Jan 6, 2023 at 4:58

1 vote

Extracting certain products from a webpage using Scrapy

I don't think you should reinvent the wheel and provide your own CSV export. The following works for me as is (note the addition of .strip() calls - though I don't ...

alecxe

17.5k

answered Sep 18, 2017 at 2:15

1 vote

Scraping a webpage copying with the logic of scrapy

Things to improve with requests library usage: timeouts: without a timeout, your code may hang for minutes or more. source User-Agent: some websites do not accept making requests if you are using bad ...

Artem Rys

answered Sep 4, 2017 at 18:16

Only top scored, non community-wiki answers of a minimum length are eligible

questions tagged

scrapy

scrapy × 37
python × 36
web-scraping × 20
python-3.x × 10
performance × 5
selenium × 4
csv × 3
php × 2
parsing × 2
xpath × 2
beginner × 1
object-oriented × 1
design-patterns × 1
recursion × 1
mysql × 1
xml × 1
http × 1
logging × 1
laravel × 1
lambda × 1
beautifulsoup × 1
mongodb × 1
docker × 1

Tag Info

Hot answers tagged scrapy

Related Tags