0

I have a script that runs multiple instances of Python Scrapy crawlers, Crawlers are int /root/crawler/batchscript.py

and in /root/crawler/ I have that scrapy crawler.

Crawlers are working perfectly fine.

batchscript.py looks like this, (posting only relevent code)

from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from amazon_crawler.spiders.amazon_scraper import MySpider

process = CrawlerProcess(get_project_settings())

When I run batchscrip.py inside /root/crawler/ directory scraper runs fine.

But when I run it from outside of this directory using python /root/crawler/batchscript.py then it does not run as intended, (Settings are not imported correctly), get_project_settings() are empty.

I have tried creating a BASH script too I create bash script called batchinit.sh

#!/bin/bash
alias batchscript="cd /root/crawler/"
python batchscript.py

and behaviour is same :(

When I run batchinit.sh inside /root/crawler/ directory scraper runs fine.

But when I run it from outside of this directory using bash /root/crawler/batchinit.sh then it does not run as intended, (Settings are not imported correctly), get_project_settings() are empty.

Why I am doing it? What is ultimate goal?

I want to create a cronjob for this script. I tried to schedule cronjobs using above mentioned commands but I have issues as mentioned above.

6
  • 1
    What are you trying to do by defining the alias in the shell script? Why not just put cd /root/crawler/ on that line instead of aliasing it to batchscript? Commented Nov 17, 2016 at 18:42
  • Where are scrapy and amazon_crawler modules? Are them in a virtual env? Commented Nov 17, 2016 at 18:42
  • This may help: stackoverflow.com/a/22466264/2874789 Commented Nov 17, 2016 at 18:44
  • @ChristopherShroba I am newbie to shell scripting ... id wrote simple cd command but it didnt work ... Commented Nov 17, 2016 at 18:46
  • See: How do I “cd” in Python? Commented Nov 17, 2016 at 19:25

1 Answer 1

3

using bash, you could always do:

cd /root/crawler && python batchscript.py

it's always good policy to use absolute paths to programs/executables referenced in cron jobs.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.