17.1 使用Web API
Web API是网站的一部分,用于与使用非常具体的URL请求特定信息的程序交互。这种请求称为API调用。请求的数据将以易于处理的格式(如JSON或CSV)返回。
17.1.1 使用API调用请求数据
https://api.github.com/search/repositories?q=language:python&sort=stars
这个调用返回GitHub当前托管了多少个Python项目,还有有关最受欢迎的Python仓库的信息。第一部分(https://api.github.com/)将请求发送到GitHub网站中响应API调用的部分;接下来的一部分(search/repositories)让API搜索GitHub上的所有仓库。
repositories后面的问号指出我们要传递一个实参。q表示查询,而等号让我们能够开始指定查询。通过使用language:python,我们指出只想获取主要语言为Python的仓库的信息。最后一部分(&sort=stars)指定将项目按其获得的星级进行排序。
下面显示了响应的前几行。从响应可知,该URL并不合适人工输入。
{ "total_count": 4046902, "incomplete_results": false, "items": [ { "id": 21289110, "node_id": "MDEwOlJlcG9zaXRvcnkyMTI4OTExMA==", "name": "awesome-python", "full_name": "vinta/awesome-python",
从第二行输出可知,GitHub总共有4046902个Python项目。"incomplete_results"的值为false,证明请求是成功的(它并非是不完整的)。接下来,“items”,其中包含GitHub上最受欢迎的Python的项目的详细信息。
17.1.2 安装requests
17.1.3 处理API响应
下面来编写一个程序,它执行API调用并处理结果,找出GitHub上星级最高的Python项目:
import requests # 执行API调用并存储响应 url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars‘ r = requests.get(url) print("Status code:",r.status_code) # 将API响应存储在一个变量中 response_dict = r.json() # 处理结果 print(response_dict.keys())
响应对象包含一个名为status_code的属性,它让我们知道请求是否成功了(状态码200表示请求成功)。
使用json()将这些信息转换为一个Python字典。
Status code: 200 dict_keys([‘total_count‘, ‘incomplete_results‘, ‘items‘])
17.1.4 处理响应字典
import requests # 执行API调用并存储响应 url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars‘ r = requests.get(url) print("Status code:",r.status_code) # 将API响应存储在一个变量中 response_dict = r.json() print("Total repositories:",response_dict[‘total_count‘]) # 探索有关仓库信息 repo_dicts = response_dict[‘items‘] print("Repositories returned:",len(repo_dicts)) # 研究第一个仓库 repo_dict = repo_dicts[0] print("\nKeys:",len(repo_dict)) for key in sorted(repo_dict.keys()): print(key)
D:\PycharmProject\Study\venv\Scripts\python.exe D:/data_visualization/python_repos.py Status code: 200 Total repositories: 3909058 Repositories returned: 30 Keys: 74 archive_url archived assignees_url --ship-- url watchers watchers_count Process finished with exit code 0
下面来提取repo_dict中与一些键相关联的值:
import requests # 执行API调用并存储响应 url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars‘ r = requests.get(url) print("Status code:",r.status_code) # 将API响应存储在一个变量中 response_dict = r.json() print("Total repositories:",response_dict[‘total_count‘]) # 探索有关仓库信息 repo_dicts = response_dict[‘items‘] print("Repositories returned:",len(repo_dicts)) # 研究第一个仓库 repo_dict = repo_dicts[0] print("\nSelect information about first repository:") # 打印了项目的名称 print("Name",repo_dict[‘name‘]) # 使用键owner来访问表示所有者的字典,再使用键key来获取所有者的登录名 print(‘Owner‘,repo_dict[‘owner‘][‘login‘]) # 打印项目获得了多少个星的评级 print(‘Stars‘,repo_dict[‘stargazers_count‘]) # 项目在GitHub仓库的URL print(‘Repository:‘,repo_dict[‘html_url‘]) # 显示项目的创建时间 print(‘Created:‘,repo_dict[‘created_at‘]) # 最后一次更新的时间 print(‘Updated:‘,repo_dict[‘updated_at‘]) # 打印仓库的描述 print(‘Description:‘,repo_dict[‘description‘])
Status code: 200 Total repositories: 4047023 Repositories returned: 30 Select information about first repository: Name awesome-python Owner vinta Stars 70375 Repository: https://github.com/vinta/awesome-python Created: 2014-06-27T21:00:06Z Updated: 2019-07-26T05:59:59Z Description: A curated list of awesome Python frameworks, libraries, software and resources
从上述可知,目前GitHub上星级最高的Python项目为awesome-python,其所有者用户为vinta,有70375多个用户给这个项目加星。创建时间为2014年6月,而且最近更新了。
17.1.5 概述最受欢迎的仓库
import requests # 执行API调用并存储响应 url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars‘ r = requests.get(url) print("Status code:",r.status_code) # 将API响应存储在一个变量中 response_dict = r.json() print("Total repositories:",response_dict[‘total_count‘]) # 探索有关仓库信息 repo_dicts = response_dict[‘items‘] print("Repositories returned:",len(repo_dicts)) print("\nSelect information about each repository:") for repo_dict in repo_dicts: # 打印了项目的名称 print("Name", repo_dict[‘name‘]) # 使用键owner来访问表示所有者的字典,再使用键key来获取所有者的登录名 print(‘Owner‘, repo_dict[‘owner‘][‘login‘]) # 打印项目获得了多少个星的评级 print(‘Stars‘, repo_dict[‘stargazers_count‘]) # 项目在GitHub仓库的URL print(‘Repository:‘, repo_dict[‘html_url‘]) # 显示项目的创建时间 print(‘Created:‘, repo_dict[‘created_at‘]) # 最后一次更新的时间 print(‘Updated:‘, repo_dict[‘updated_at‘]) # 打印仓库的描述 print(‘Description:‘, repo_dict[‘description‘])
Status code: 200 Total repositories: 4047042 Repositories returned: 30 Select information about each repository: Name awesome-python Owner vinta Stars 70376 Repository: https://github.com/vinta/awesome-python Created: 2014-06-27T21:00:06Z Updated: 2019-07-26T06:19:01Z Description: A curated list of awesome Python frameworks, libraries, software and resources Name system-design-primer Owner donnemartin Stars 69771 Repository: https://github.com/donnemartin/system-design-primer Created: 2017-02-26T16:15:28Z Updated: 2019-07-26T06:18:39Z Description: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards. --ship-- Name sentry Owner getsentry Stars 21590 Repository: https://github.com/getsentry/sentry Created: 2010-08-30T22:06:41Z Updated: 2019-07-26T02:36:53Z Description: Sentry is cross-platform application monitoring, with a focus on error reporting. Name python-patterns Owner faif Stars 21422 Repository: https://github.com/faif/python-patterns Created: 2012-06-06T21:02:35Z Updated: 2019-07-26T06:03:16Z Description: A collection of design patterns/idioms in Python
17.1.6 监视API的速率限制
大多数API都存在速率限制,即你在特定时间内可执行的请求数存在限制。要获悉你是否接近了GitHub的限制,请在浏览器中输入https://api.github.com/rate_limit,看到类似下面的响应:
{ "resources": { "core": { "limit": 60, "remaining": 60, "reset": 1564126347 }, "search": { "limit": 10, "remaining": 10, "reset": 1564122807 }, "graphql": { "limit": 0, "remaining": 0, "reset": 1564126347 }, "integration_manifest": { "limit": 5000, "remaining": 5000, "reset": 1564126347 } }, "rate": { "limit": 60, "remaining": 60, "reset": 1564126347 } }
由上面的标记处可知,极限为每分钟10个请求,而在当前这一分钟内,我们还可以执行10个请求。reset指的是配额将重置的Unix时间或新纪元时间。用完配额后,你将收到一条简单的响应,由此可知已到达API极限。
17.2 使用Pygal可视化仓库
创建一个交互式条形图:条形的高度表示项目获得了多少颗星。单击条形将进入项目在GitHub上的主页。
import requests import pygal from pygal.style import LightColorizedStyle as LCS from pygal.style import LightenStyle as LS # 执行API调用并存储响应 url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars‘ r = requests.get(url) print("Status code:", r.status_code) # 将API响应存储在一个变量中 response_dict = r.json() print("Total repositories:", response_dict[‘total_count‘]) # 探索有关仓库信息 repo_dicts = response_dict[‘items‘] names, stars = [], [] for repo_dict in repo_dicts: names.append(repo_dict[‘name‘]) stars.append(repo_dict[‘stargazers_count‘]) # 可视化 # 使用LS类定义一种样式,并将其基色设置为深蓝色 my_style = LS(‘#333366‘,base_style=LCS) chart = pygal.Bar(style=my_style,x_label_rotation=45,show_legend=False) chart.title = ‘Most-Starred Python Projects on GitHub‘ chart.x_labels = names chart.add(‘‘,stars) chart.render_to_file(‘python_repos.svg‘)
真想掐死自己,导包的时候千万要注意。LightenStyle,我导成LightStyle一直报错弄了有两个小时。
17.2.1 改进Pygal图表
import requests import pygal from pygal.style import LightColorizedStyle as LCS from pygal.style import LightenStyle as LS # 执行API调用并存储响应 url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars‘ r = requests.get(url) print("Status code:", r.status_code) # 将API响应存储在一个变量中 response_dict = r.json() print("Total repositories:", response_dict[‘total_count‘]) # 探索有关仓库信息 repo_dicts = response_dict[‘items‘] names, stars = [], [] for repo_dict in repo_dicts: names.append(repo_dict[‘name‘]) stars.append(repo_dict[‘stargazers_count‘]) # 可视化 my_style = LS(‘#333366‘, base_style=LCS) my_config = pygal.Config() # 用于定制图表的外观 my_config.x_label_rotation = 45 # 标签绕 x 轴旋转 45 度 my_config.show_legend = False # 隐藏图例 my_config.title_font_size = 24 # 设置图表标题的字体大小 my_config.label_font_size = 14 # 设置图副标签的字体大小 my_config.major_label_font_size = 18 # 设置主标签的字体大小 my_config.truncate_label = 15 # 仅显示 15 个字符 my_config.show_y_guides = False # 隐藏图表中的水平线 my_config.width = 1000 # 设置自定义宽度 chart = pygal.Bar(my_config, style=my_style) chart.add(‘‘, stars) chart.render_to_file(‘python_repos.svg‘)
17.2.2 添加自定义工具提示
在Pygal中,将鼠标指向条形显示它表示的信息,这通常称为工具提示。
下面来创建一个自定义工具提示,以同时显示项目的描述。向add()传递一个字典列表,而不是列表。
import pygal from pygal.style import LightColorizedStyle as LCS,LightenStyle as LS my_style = LS(‘#333366‘,base_style=LCS) chart = pygal.Bar(style=my_style,x_label_rotation=45,show_legend=False) chart.title = ‘Python Projects‘ chart.x_labels=[‘httpie‘,‘django‘,‘flask‘] plot_dicts = [ {‘value‘:16101,‘label‘:‘Description of httpie.‘}, {‘value‘:15028,‘label‘:‘Description of django.‘}, {‘value‘:14798,‘label‘:‘Description of flask.‘} ] chart.add(‘‘,plot_dicts) chart.render_to_file(‘bar_descrption.svg‘)
17.2.3 根据数据绘图
import requests import pygal from pygal.style import LightColorizedStyle as LCS from pygal.style import LightenStyle as LS # 执行API调用并存储响应 url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars‘ r = requests.get(url) print("Status code:", r.status_code) # 将API响应存储在一个变量中 response_dict = r.json() print("Total repositories:", response_dict[‘total_count‘]) # 探索有关仓库信息 repo_dicts = response_dict[‘items‘] names, plot_dicts = [], [] for repo_dict in repo_dicts: names.append(repo_dict[‘name‘]) plot_dict = { ‘value‘:repo_dict[‘stargazers_count‘], ‘label‘:repo_dict[‘description‘], } plot_dicts.append(plot_dict) # 可视化 my_style = LS(‘#333366‘, base_style=LCS) my_config = pygal.Config() # 用于定制图表的外观 my_config.x_label_rotation = 45 # 标签绕 x 轴旋转 45 度 my_config.show_legend = False # 隐藏图例 my_config.title_font_size = 24 # 设置图表标题的字体大小 my_config.label_font_size = 14 # 设置图副标签的字体大小 my_config.major_label_font_size = 18 # 设置主标签的字体大小 my_config.truncate_label = 15 # 仅显示 15 个字符 my_config.show_y_guides = False # 隐藏图表中的水平线 my_config.width = 1000 # 设置自定义宽度 chart = pygal.Bar(my_config, style=my_style) chart.title = ‘Most-Starred Python Projects on GitHub‘ chart.x_labels = names chart.add(‘‘, plot_dicts) chart.render_to_file(‘python_repos.svg‘)
17.2.4 在图表中添加可单击的链接
Pygal还允许你将图表中的每个条形用作网站的链接。为此只需要添加一行代码,在位每个项目创建的字典中,添加一个键为‘xlink’的键-值对。
for repo_dict in repo_dicts: names.append(repo_dict[‘name‘]) plot_dict = { ‘value‘:repo_dict[‘stargazers_count‘], ‘label‘:repo_dict[‘description‘], ‘xlink‘:repo_dict[‘html_url‘] }
17.3 Haxker News API
下面执行一个API调用,返回Haxker News上当前最热门的文章的ID,再查看每篇排名靠前的文章:
import requests from operator import itemgetter # 执行API调用并存储响应 url = ‘https://hacker-news.firebaseio.com/v0/topstories.json‘ r = requests.get(url) print("Status code:",r.status_code) # 处理有关每篇文章的信息 submission_ids = r.json() submission_dicts = [] for submission_id in submission_ids[:30]: # 对于每篇文章,都执行一个API调用 url = (‘https://hacker-news.firebaseio.com/v0/item/‘+ str(submission_id)+‘.json‘) submission_r = requests.get(url) print(submission_r.status_code) response_dict = submission_r.json() submission_dict = { ‘title‘:response_dict[‘title‘], ‘link‘:‘http://news.ycombinator.com/item?id=‘ + str(submission_id), ‘comments‘:response_dict.get(‘descendants‘,0) } submission_dicts.append(submission_dict) submission_dicts = sorted(submission_dicts,key=itemgetter(‘comments‘), reverse=True) for submission_dict in submission_dicts: print("\nTitle:",submission_dict[‘title‘]) print("Discussion link:",submission_dict[‘link‘]) print("Comments:",submission_dict[‘comments‘])
dict.get(),它在指定的键存在时返回与之相关联的值,并在指定的键不存在时,返回你指定的值(这里是0)
D:\PycharmProject\Study\venv\Scripts\python.exe D:/data_visualization/hn_submission.py Status code: 200 200 200 --ship-- Title: A "cure" for baldness could be around the corner Discussion link: http://news.ycombinator.com/item?id=20531394 Comments: 231 Title: Square’s Growth Framework for Engineers and Engineering Managers Discussion link: http://news.ycombinator.com/item?id=20530046 Comments: 204 Title: Photographers, Instagrammers: Stop Being So Selfish and Disrespectful Discussion link: http://news.ycombinator.com/item?id=20530350 Comments: 161 Title: Photos and fingerprints of all EU citizens copied from the UK to the US Discussion link: http://news.ycombinator.com/item?id=20533576 Comments: 108--ship-- Process finished with exit code 0