Skip to main content
7 votes
Accepted

Locating contact information from a CSV spreadsheet

You would do well to use generator functions to break this down into parts that might be re-usable for other steps in your pipeline. For example, the first thing you do in your loop is read a row ...
aghast's user avatar
  • 12.6k
5 votes
Accepted

small web scraper to read product highlight from given urls

Splitting lines Don't call splitlines() here: ...
Reinderien's user avatar
  • 71.1k
5 votes
Accepted

Parsing a slow-loading webpage with scrapy in combination with selenium

The spider is readable and understandable. I would only extract some of the things into separate methods for readability. For example, the "infinite scroll" should probably be just defined in a ...
alecxe's user avatar
  • 17.5k
4 votes
Accepted

Parsing an XML tree of categories

You can start by just extracting the bit that clearly is repeated into a standalone function: ...
ades's user avatar
  • 1,391
4 votes
Accepted

Scraping a dynamic website with Scrapy (or Requests) and Selenium

the search field for this site is dynamically generated That doesn't matter, since - if you bypass the UI - the form field name itself is not dynamic. Even if you were to keep using Selenium, it ...
Reinderien's user avatar
  • 71.1k
4 votes
Accepted

Webscraping from yellowbook

First: a general observation - simple nested if statements are equivalent to a single if with statements joined by ...
match's user avatar
  • 626
4 votes
Accepted

Scraping a webpage copying with the logic of scrapy

Here are some of the things I would improve in the code: put each import on a separate line, there is not much point in saving space in this case: ...
alecxe's user avatar
  • 17.5k
4 votes

Scraping a webpage copying with the logic of scrapy

Even though you are trying to mimic what Scrapy spider might look like, there is a very major high-level difference between how your code is executed and how a Scrapy spider is. Scrapy is entirely ...
alecxe's user avatar
  • 17.5k
3 votes
Accepted

Web Scraping Dynamically Generated Content Python

Combined imports ...
Reinderien's user avatar
  • 71.1k
3 votes

Writing to a csv file in a customized way using scrapy

I'd like to mention, that there is a special way of making output files in scrapy - item pipelines. So, in order to make it right, you should write your own pipeline (or modify standard one via ...
Dmitry Arkhipenko's user avatar
2 votes

Scrapy Spider for fetching product data from multiple pages of a website

Overall, it looks great. deprecated class Doing $ scrapy runspider *.py reveals that the OP code chooses to use the 2.6 ...
J_H's user avatar
  • 42.1k
2 votes
Accepted

Save data in item wihitn Scrapy and Python

Have you tried using a collections.namedtuple or a dataclasses.dataclass rather than a dict? ...
DeathIncarnate's user avatar
2 votes

Crawling thesaurus for a synonym

I don't see $this->syn_src_limit or $path changing anywhere in your code. For this reason, these are good candidates for ...
mickmackusa's user avatar
  • 8,802
2 votes

Crawling thesaurus for a synonym

Base Review It may not work on every server, depending on its OS (I'm running Windows 10) One thing to make it more portable would be to use the Predefined Constant ...
Sᴀᴍ Onᴇᴌᴀ's user avatar
2 votes

Website parser for advertisements

While this answer is a long time after the question was asked, it seems like a solid question, so I'll give answering it a shot. Small things I'll start off with a few small points that aren't all ...
spyr03's user avatar
  • 3,052
2 votes

Creating a csv file using scrapy

By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are ...
alecxe's user avatar
  • 17.5k
1 vote
Accepted

Sourcing data format from multiple different structures

Don't abuse inner lists This: self.name = ' '.join([data.get('name').get(key) for key in ['first_name', 'last_name']]) should be ...
Reinderien's user avatar
  • 71.1k
1 vote

Creating a csv file using scrapy

Although I'm not an expert on this, I thought to come up with a solution which I've been following quite some time. Making use of signals might be a wise attempt ...
SIM's user avatar
  • 2,501
1 vote
Accepted

Writing to a csv file in a customized way using scrapy

You should opt for closed() method as I've tried below. This method will be called automatically once your spider is closed. This method provides a shortcut to signals.connect() for the spider_closed ...
SIM's user avatar
  • 2,501
1 vote

Writing to a csv file in a customized way using scrapy

You should ensure that the file is closed. In addition you should avoid creating a new writer object every loop iteration using the with statement: ...
Graipher's user avatar
  • 41.7k
1 vote

Parsing different categories using Scrapy from a webpage

this scraper is very repetitive Yeah, I get that motivation. I'm glad that's a thing you're worried about. Offhand? No, I don't see any obvious improvements to the code. But I will observe that bs4 ...
J_H's user avatar
  • 42.1k
1 vote

Extracting certain products from a webpage using Scrapy

I don't think you should reinvent the wheel and provide your own CSV export. The following works for me as is (note the addition of .strip() calls - though I don't ...
alecxe's user avatar
  • 17.5k
1 vote

Scraping a webpage copying with the logic of scrapy

Things to improve with requests library usage: timeouts: without a timeout, your code may hang for minutes or more. source User-Agent: some websites do not accept making requests if you are using bad ...
Artem Rys's user avatar

Only top scored, non community-wiki answers of a minimum length are eligible