scrapy定时执行抓取任务

2022-12-02 10:11:02

在ubuntu环境下，使用scrapy定时执行抓取任务，由于scrapy本身没有提供定时执行的功能，所以采用了crontab的方式进行定时执行：

首先编写要执行的命令脚本cron.sh

#! /bin/sh                                                                                                                                            

export PATH=$PATH:/usr/local/bin

cd /home/zhangchao/CVS/testCron

nohup scrapy crawl example >> example.log 2>&1 &

执行，crontab -e，规定crontab要执行的命令和要执行的时间频率，这里我需要每一分钟就执行scrapy crawl example这条爬取命令：

# Edit this file to introduce tasks to be run by cron.

#

# Each task to run has to be defined through a single line

# indicating with different fields when the task will be run

# and what command to run for the task

#

# To define the time you can provide concrete values for

# minute (m), hour (h), day of month (dom), month (mon),

# and day of week (dow) or use '*' in these fields (for 'any').#

# Notice that tasks will be started based on the cron's system

# daemon's notion of time and timezones.

#

# Output of the crontab jobs (including errors) is sent through

# email to the user the crontab file belongs to (unless redirected).

#

# For example, you can run a backup of all your user accounts

# at 5 a.m every week with:

# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/

#

# For more information see the manual pages of crontab(5) and cron(8)

#

# m h  dom mon dow   command

*/1 * * * *  sh /home/zhangchao/CVS/testCron/cron.sh