7
votes
Accepted
Locating contact information from a CSV spreadsheet
You would do well to use generator functions to break this down into parts that might be re-usable for other steps in your pipeline.
For example, the first thing you do in your loop is read a row ...
5
votes
Accepted
small web scraper to read product highlight from given urls
Splitting lines
Don't call splitlines() here:
...
5
votes
Accepted
Parsing a slow-loading webpage with scrapy in combination with selenium
The spider is readable and understandable. I would only extract some of the things into separate methods for readability. For example, the "infinite scroll" should probably be just defined in a ...
4
votes
Accepted
Parsing an XML tree of categories
You can start by just extracting the bit that clearly is repeated into a standalone function:
...
4
votes
Accepted
Scraping a dynamic website with Scrapy (or Requests) and Selenium
the search field for this site is dynamically generated
That doesn't matter, since - if you bypass the UI - the form field name itself is not dynamic. Even if you were to keep using Selenium, it ...
4
votes
Accepted
Webscraping from yellowbook
First: a general observation - simple nested if statements are equivalent to a single if with statements joined by ...
4
votes
Accepted
Scraping a webpage copying with the logic of scrapy
Here are some of the things I would improve in the code:
put each import on a separate line, there is not much point in saving space in this case:
...
4
votes
Scraping a webpage copying with the logic of scrapy
Even though you are trying to mimic what Scrapy spider might look like, there is a very major high-level difference between how your code is executed and how a Scrapy spider is.
Scrapy is entirely ...
3
votes
Accepted
3
votes
Writing to a csv file in a customized way using scrapy
I'd like to mention, that there is a special way of making output files in scrapy - item pipelines. So, in order to make it right, you should write your own pipeline (or modify standard one via ...
2
votes
Scrapy Spider for fetching product data from multiple pages of a website
Overall, it looks great.
deprecated class
Doing $ scrapy runspider *.py reveals that the OP code
chooses to use the 2.6 ...
2
votes
Accepted
Save data in item wihitn Scrapy and Python
Have you tried using a collections.namedtuple or a dataclasses.dataclass rather than a dict?
...
2
votes
Crawling thesaurus for a synonym
I don't see $this->syn_src_limit or $path changing anywhere in your code. For this reason, these are good candidates for ...
2
votes
Crawling thesaurus for a synonym
Base Review
It may not work on every server, depending on its OS (I'm running Windows 10)
One thing to make it more portable would be to use the Predefined Constant ...
2
votes
Website parser for advertisements
While this answer is a long time after the question was asked, it seems like a solid question, so I'll give answering it a shot.
Small things
I'll start off with a few small points that aren't all ...
2
votes
Creating a csv file using scrapy
By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are ...
1
vote
Accepted
Sourcing data format from multiple different structures
Don't abuse inner lists
This:
self.name = ' '.join([data.get('name').get(key) for key in ['first_name', 'last_name']])
should be
...
1
vote
Creating a csv file using scrapy
Although I'm not an expert on this, I thought to come up with a solution which I've been following quite some time.
Making use of signals might be a wise attempt ...
1
vote
Accepted
Writing to a csv file in a customized way using scrapy
You should opt for closed() method as I've tried below. This method will be called automatically once your spider is closed. This method provides a shortcut to signals.connect() for the spider_closed ...
1
vote
Writing to a csv file in a customized way using scrapy
You should ensure that the file is closed. In addition you should avoid creating a new writer object every loop iteration using the with statement:
...
1
vote
Parsing different categories using Scrapy from a webpage
this scraper is very repetitive
Yeah, I get that motivation.
I'm glad that's a thing you're worried about.
Offhand? No, I don't see any obvious improvements to the code.
But I will observe that
bs4
...
1
vote
Extracting certain products from a webpage using Scrapy
I don't think you should reinvent the wheel and provide your own CSV export. The following works for me as is (note the addition of .strip() calls - though I don't ...
1
vote
Scraping a webpage copying with the logic of scrapy
Things to improve with requests library usage:
timeouts: without a timeout, your code may hang for minutes or more.
source
User-Agent: some websites do not accept making requests if you are using bad ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
scrapy × 37python × 36
web-scraping × 20
python-3.x × 10
performance × 5
selenium × 4
csv × 3
php × 2
parsing × 2
xpath × 2
beginner × 1
object-oriented × 1
design-patterns × 1
recursion × 1
mysql × 1
xml × 1
http × 1
logging × 1
laravel × 1
lambda × 1
beautifulsoup × 1
mongodb × 1
docker × 1