Python -- scrapy

 

@、使用scrapyd、python-scrapyd-api 部署

原文:

scrapyd安装:https://cuiqingcai.com/31049.html

python-scrapyd-api安装:https://cuiqingcai.com/31052.html

我的做法
  - 使用win10子系统ubuntu

  - 在ubuntu中创建虚拟环境,并切换到虚拟环境

  - pip3 install scrapyd

  - 直接在命令行输入:scrapyd 回车 启动scrapyd

  - 回到win10系统,创建一个scrapy项目:scrapy startproject scrapyd_demo

  - 进入项目目录:cd scrapyd_demo

  - 生成一个spider:scrapy genspider baidu baidu.com

  - 在项目目录下创建setup.py文件,注意:这里可能要配置entry_points,否则有可能报错

Python -- scrapy
from setuptools import setup, find_packages

setup(
    name='scrapyd_demo',
    version='0.0.1',
    packages=find_packages(),
    entry_points={'scrapy': ['settings = scrapyd_demo.settings']}, 
)

''' 如果不配置entry_points,通过scrapyd-api方法add_version进行部署时可能会出现下面错误
settings_module = d.get_entry_info('scrapy', 'settings').module_name
AttributeError: 'NoneType' object has no attribute 'module_name'
'''
View Code

Python -- scrapy

  - 打包成egg https://www.cnblogs.com/yarightok/p/15567642.html

  - 在生成的dist目录下创建一个用来部署的文件,比如:scrapyd_api_deploy.py,添加如下代码:

Python -- scrapy
import os
from scrapyd_api import ScrapydAPI

scrapyd = ScrapydAPI('http://localhost:6800')
egg_file = open(f'{os.path.dirname(__file__)}\scrapyd_demo-0.0.1-py3.6.egg', 'rb')
scrapyd.add_version('scrapyd_demo', version='0.0.1', egg=egg_file)
print('projects: ', scrapyd.list_projects())
print('spiders: ', scrapyd.list_spiders('scrapyd_demo'))

''' 输出结果
projects:  ['scrapyd_demo']
spiders:  ['baidu']
'''
View Code

Python -- scrapy

   - 运行上面的文件

  - 在浏览器上运行:http://127.0.0.1:6800/listprojects.json

  - 返回结果:{"node_name": "DESKTOP-MM3IOUH", "status": "ok", "projects": ["scrapyd_demo"]}

 

上一篇:scrapy获取58同城数据


下一篇:scrapy获取当当网多页的获取