对比其他语言来说,python中的文件句柄操作是即简洁又简便。常用保存形式有TXT,JSON,CSV。本文就介绍了CSV文件存储
写入:
这里先看一个最简单的例子
import csv with open('./data.csv',mode='w') as csvfile: writer = csv.writer(csvfile) writer.writerow(['id','name','12']) writer.writerow(['1', 'ccdjun','20']) writer.writerow(['2', 'bob', '33']) writer.writerow(['3', 'alex', '22’])
首先打开data.csv文件,指定打开模式为w,随后实例化一个writer对象,传入文件句柄即csvfile,最后调用writerow()方法写入即可完成。运行结束后会生成一个data.csv文件其内容如下
id,name,12 1,ccdjun,20 2,bob,33 3,alex,22
也可以使用writerows()写入多行,此时参数就需要为二位列表
import csv with open('./data.csv',mode='w') as csvfile: writer = csv.writer(csvfile) writer.writerow(['id','name','12’]) writer.writerows([['1', 'ccdjun','20’],['2', 'bob', '33’],['3', 'alex', '22']])
这时的的效果与上面是相同的
但是一般情况下,爬虫爬取的都是结构化数据,我们一般都会用字典来表示,在csv库中也提供了字典的写入方式:
with open('./content.csv',mode=‘w') as csvfile: filednames = [‘id’,’name’,’age'] writer = csv.DictWriter(csvfile,fieldnames=filenames) writer.writeheader() writer.writerow({‘id’:1,’namet’:’ccdjun’,’age’:22}) writer.writerow({‘id’:2,’namet’:’alex’,’age’:25}) writer.writerow({‘id’:3,’namet’:’bob,’age’:32})
这里先定义了三个字端,用filednames表示,然后将其传给DictWrite来初始化一个字典写入对象,接着可以调用writeheader()方法先写入头信息,然后再调用writerow方法传入相应字典即可。最终写入结果是一样的。
如果想追加写入的话,mode后面赋上'a'就可以了,如果想写入中文的话就得指定编码,也就是在mode后添加encoding='utf-8'
with open(‘./data.csv’,mode=‘a’,encoding=‘utf-8') as csvfile:
读取:
同样可以使用csv库读取CSV文件
with open('./data.csv',mode='r') as csvfile: reader = csv.reader(csvfile) for row in reader: print(row)
下面看一个爬虫使用csv文件存储的例子:
import requests
from lxml import etree
import csv
headers = { 'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36', } for i in range(1,11): url = f'https://www.qiushibaike.com/text/page/{i}/‘ #构造糗百前10页的url response = requests.get(url=url,headers=headers).text tree = etree.HTML(response) div_list = tree.xpath('//*[@id="content"]/div/div[2]/div') for div in div_list: content = div.xpath('./a/div/span//text()')[0] author = div.xpath('./div/a[2]/h2/text()')[0] # print(author,content) with open('./content.csv',mode='a',encoding='utf-8') as csvfile: filenames = ['author','content'] writer = csv.DictWriter(csvfile,fieldnames=filenames) writer.writeheader() writer.writerow({'author':author,'content':content})
爬虫例子仅供学习参考不作它用