项目2 可视化数据(第17章 使用API)

17.1 使用Web API

  Web API是网站的一部分,用于与使用非常具体的URL请求特定信息的程序交互。这种请求称为API调用。请求的数据将以易于处理的格式(如JSON或CSV)返回。

17.1.1 使用API调用请求数据

https://api.github.com/search/repositories?q=language:python&sort=stars

  这个调用返回GitHub当前托管了多少个Python项目,还有有关最受欢迎的Python仓库的信息。第一部分(https://api.github.com/)将请求发送到GitHub网站中响应API调用的部分;接下来的一部分(search/repositories)让API搜索GitHub上的所有仓库

  repositories后面的问号指出我们要传递一个实参。q表示查询,而等号让我们能够开始指定查询。通过使用language:python,我们指出只想获取主要语言为Python的仓库的信息。最后一部分(&sort=stars)指定将项目按其获得的星级进行排序

  下面显示了响应的前几行。从响应可知,该URL并不合适人工输入。

{
  "total_count": 4046902,
  "incomplete_results": false,
  "items": [
    {
      "id": 21289110,
      "node_id": "MDEwOlJlcG9zaXRvcnkyMTI4OTExMA==",
      "name": "awesome-python",
      "full_name": "vinta/awesome-python",

  从第二行输出可知,GitHub总共有4046902个Python项目。"incomplete_results"的值为false,证明请求是成功的(它并非是不完整的)。接下来,“items”,其中包含GitHub上最受欢迎的Python的项目的详细信息。

17.1.2 安装requests

  项目2 可视化数据(第17章 使用API)

17.1.3 处理API响应

  下面来编写一个程序,它执行API调用并处理结果,找出GitHub上星级最高的Python项目:

import requests

# 执行API调用并存储响应
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:",r.status_code)

# 将API响应存储在一个变量中
response_dict = r.json()

# 处理结果
print(response_dict.keys())

  响应对象包含一个名为status_code的属性,它让我们知道请求是否成功了(状态码200表示请求成功)。

  使用json()将这些信息转换为一个Python字典。

Status code: 200
dict_keys([total_count, incomplete_results, items])

17.1.4 处理响应字典

import requests

# 执行API调用并存储响应
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:",r.status_code)

# 将API响应存储在一个变量中
response_dict = r.json()
print("Total repositories:",response_dict[total_count])
# 探索有关仓库信息
repo_dicts = response_dict[items]
print("Repositories returned:",len(repo_dicts))
# 研究第一个仓库
repo_dict = repo_dicts[0]
print("\nKeys:",len(repo_dict))
for key in sorted(repo_dict.keys()):
    print(key)
D:\PycharmProject\Study\venv\Scripts\python.exe D:/data_visualization/python_repos.py
Status code: 200
Total repositories: 3909058
Repositories returned: 30

Keys: 74
archive_url
archived
assignees_url
--ship--
url
watchers
watchers_count

Process finished with exit code 0

  下面来提取repo_dict中与一些键相关联的值:

import requests

# 执行API调用并存储响应
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:",r.status_code)

# 将API响应存储在一个变量中
response_dict = r.json()
print("Total repositories:",response_dict[total_count])
# 探索有关仓库信息
repo_dicts = response_dict[items]
print("Repositories returned:",len(repo_dicts))
# 研究第一个仓库
repo_dict = repo_dicts[0]

print("\nSelect information about first repository:")
# 打印了项目的名称
print("Name",repo_dict[name])
# 使用键owner来访问表示所有者的字典,再使用键key来获取所有者的登录名
print(Owner,repo_dict[owner][login])
# 打印项目获得了多少个星的评级
print(Stars,repo_dict[stargazers_count])
# 项目在GitHub仓库的URL
print(Repository:,repo_dict[html_url])
# 显示项目的创建时间
print(Created:,repo_dict[created_at])
# 最后一次更新的时间
print(Updated:,repo_dict[updated_at])
# 打印仓库的描述
print(Description:,repo_dict[description])
Status code: 200
Total repositories: 4047023
Repositories returned: 30

Select information about first repository:
Name awesome-python
Owner vinta
Stars 70375
Repository: https://github.com/vinta/awesome-python
Created: 2014-06-27T21:00:06Z
Updated: 2019-07-26T05:59:59Z
Description: A curated list of awesome Python frameworks, libraries, software and resources

  从上述可知,目前GitHub上星级最高的Python项目为awesome-python,其所有者用户为vinta,有70375多个用户给这个项目加星。创建时间为2014年6月,而且最近更新了。

17.1.5 概述最受欢迎的仓库

import requests

# 执行API调用并存储响应
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:",r.status_code)

# 将API响应存储在一个变量中
response_dict = r.json()
print("Total repositories:",response_dict[total_count])
# 探索有关仓库信息
repo_dicts = response_dict[items]
print("Repositories returned:",len(repo_dicts))

print("\nSelect information about each repository:")
for repo_dict in repo_dicts:
    # 打印了项目的名称
    print("Name", repo_dict[name])
    # 使用键owner来访问表示所有者的字典,再使用键key来获取所有者的登录名
    print(Owner, repo_dict[owner][login])
    # 打印项目获得了多少个星的评级
    print(Stars, repo_dict[stargazers_count])
    # 项目在GitHub仓库的URL
    print(Repository:, repo_dict[html_url])
    # 显示项目的创建时间
    print(Created:, repo_dict[created_at])
    # 最后一次更新的时间
    print(Updated:, repo_dict[updated_at])
    # 打印仓库的描述
    print(Description:, repo_dict[description])
Status code: 200
Total repositories: 4047042
Repositories returned: 30

Select information about each repository:
Name awesome-python
Owner vinta
Stars 70376
Repository: https://github.com/vinta/awesome-python
Created: 2014-06-27T21:00:06Z
Updated: 2019-07-26T06:19:01Z
Description: A curated list of awesome Python frameworks, libraries, software and resources
Name system-design-primer
Owner donnemartin
Stars 69771
Repository: https://github.com/donnemartin/system-design-primer
Created: 2017-02-26T16:15:28Z
Updated: 2019-07-26T06:18:39Z
Description: Learn how to design large-scale systems. Prep for the system design interview.  Includes Anki flashcards.

--ship--
Name sentry
Owner getsentry
Stars 21590
Repository: https://github.com/getsentry/sentry
Created: 2010-08-30T22:06:41Z
Updated: 2019-07-26T02:36:53Z
Description: Sentry is cross-platform application monitoring, with a focus on error reporting.
Name python-patterns
Owner faif
Stars 21422
Repository: https://github.com/faif/python-patterns
Created: 2012-06-06T21:02:35Z
Updated: 2019-07-26T06:03:16Z
Description: A collection of design patterns/idioms in Python

17.1.6 监视API的速率限制

  大多数API都存在速率限制,即你在特定时间内可执行的请求数存在限制。要获悉你是否接近了GitHub的限制,请在浏览器中输入https://api.github.com/rate_limit,看到类似下面的响应:

{
  "resources": {
    "core": {
      "limit": 60,
      "remaining": 60,
      "reset": 1564126347
    },
    "search": {
      "limit": 10,
      "remaining": 10,
      "reset": 1564122807
    },
    "graphql": {
      "limit": 0,
      "remaining": 0,
      "reset": 1564126347
    },
    "integration_manifest": {
      "limit": 5000,
      "remaining": 5000,
      "reset": 1564126347
    }
  },
  "rate": {
    "limit": 60,
    "remaining": 60,
    "reset": 1564126347
  }
}

  由上面的标记处可知,极限为每分钟10个请求,而在当前这一分钟内,我们还可以执行10个请求。reset指的是配额将重置的Unix时间或新纪元时间。用完配额后,你将收到一条简单的响应,由此可知已到达API极限。

17.2 使用Pygal可视化仓库

  创建一个交互式条形图:条形的高度表示项目获得了多少颗星。单击条形将进入项目在GitHub上的主页。

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS
from pygal.style import LightenStyle as LS

# 执行API调用并存储响应
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:", r.status_code)

# 将API响应存储在一个变量中
response_dict = r.json()
print("Total repositories:", response_dict[total_count])
# 探索有关仓库信息
repo_dicts = response_dict[items]
names, stars = [], []
for repo_dict in repo_dicts:
    names.append(repo_dict[name])
    stars.append(repo_dict[stargazers_count])

# 可视化
# 使用LS类定义一种样式,并将其基色设置为深蓝色
my_style = LS(#333366,base_style=LCS)
chart = pygal.Bar(style=my_style,x_label_rotation=45,show_legend=False)
chart.title = Most-Starred Python Projects on GitHub
chart.x_labels = names

chart.add(‘‘,stars)
chart.render_to_file(python_repos.svg)

  真想掐死自己,导包的时候千万要注意。LightenStyle,我导成LightStyle一直报错弄了有两个小时。

项目2 可视化数据(第17章 使用API)

17.2.1 改进Pygal图表

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS
from pygal.style import LightenStyle as LS

# 执行API调用并存储响应
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:", r.status_code)

# 将API响应存储在一个变量中
response_dict = r.json()
print("Total repositories:", response_dict[total_count])
# 探索有关仓库信息
repo_dicts = response_dict[items]
names, stars = [], []
for repo_dict in repo_dicts:
    names.append(repo_dict[name])
    stars.append(repo_dict[stargazers_count])

# 可视化
my_style = LS(#333366, base_style=LCS)
my_config = pygal.Config()  # 用于定制图表的外观
my_config.x_label_rotation = 45  # 标签绕 x 轴旋转 45 度
my_config.show_legend = False  # 隐藏图例
my_config.title_font_size = 24  # 设置图表标题的字体大小
my_config.label_font_size = 14  # 设置图副标签的字体大小
my_config.major_label_font_size = 18  # 设置主标签的字体大小
my_config.truncate_label = 15  # 仅显示 15 个字符
my_config.show_y_guides = False  # 隐藏图表中的水平线
my_config.width = 1000  # 设置自定义宽度

chart = pygal.Bar(my_config, style=my_style)
chart.add(‘‘, stars)
chart.render_to_file(python_repos.svg)

项目2 可视化数据(第17章 使用API)

17.2.2 添加自定义工具提示

   在Pygal中,将鼠标指向条形显示它表示的信息,这通常称为工具提示。

  下面来创建一个自定义工具提示,以同时显示项目的描述。向add()传递一个字典列表,而不是列表。

import pygal
from pygal.style import LightColorizedStyle as LCS,LightenStyle as LS

my_style = LS(#333366,base_style=LCS)
chart = pygal.Bar(style=my_style,x_label_rotation=45,show_legend=False)
chart.title = Python Projects
chart.x_labels=[httpie,django,flask]

plot_dicts = [
    {value:16101,label:Description of httpie.},
    {value:15028,label:Description of django.},
    {value:14798,label:Description of flask.}
]
chart.add(‘‘,plot_dicts)
chart.render_to_file(bar_descrption.svg)

项目2 可视化数据(第17章 使用API)

17.2.3 根据数据绘图

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS
from pygal.style import LightenStyle as LS

# 执行API调用并存储响应
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:", r.status_code)

# 将API响应存储在一个变量中
response_dict = r.json()
print("Total repositories:", response_dict[total_count])
# 探索有关仓库信息
repo_dicts = response_dict[items]
names, plot_dicts = [], []
for repo_dict in repo_dicts:
    names.append(repo_dict[name])
    plot_dict = {
        value:repo_dict[stargazers_count],
        label:repo_dict[description],
    }
    plot_dicts.append(plot_dict)

# 可视化
my_style = LS(#333366, base_style=LCS)
my_config = pygal.Config()  # 用于定制图表的外观
my_config.x_label_rotation = 45  # 标签绕 x 轴旋转 45 度
my_config.show_legend = False  # 隐藏图例
my_config.title_font_size = 24  # 设置图表标题的字体大小
my_config.label_font_size = 14  # 设置图副标签的字体大小
my_config.major_label_font_size = 18  # 设置主标签的字体大小
my_config.truncate_label = 15  # 仅显示 15 个字符
my_config.show_y_guides = False  # 隐藏图表中的水平线
my_config.width = 1000  # 设置自定义宽度

chart = pygal.Bar(my_config, style=my_style)
chart.title = Most-Starred Python Projects on GitHub
chart.x_labels = names

chart.add(‘‘, plot_dicts)
chart.render_to_file(python_repos.svg)

项目2 可视化数据(第17章 使用API)

17.2.4 在图表中添加可单击的链接

  Pygal还允许你将图表中的每个条形用作网站的链接。为此只需要添加一行代码,在位每个项目创建的字典中,添加一个键为‘xlink’的键-值对。

for repo_dict in repo_dicts:
    names.append(repo_dict[name])
    plot_dict = {
        value:repo_dict[stargazers_count],
        label:repo_dict[description],
        xlink:repo_dict[html_url]
    }

17.3 Haxker News API

  下面执行一个API调用,返回Haxker News上当前最热门的文章的ID,再查看每篇排名靠前的文章:

import requests
from operator import  itemgetter

# 执行API调用并存储响应
url = https://hacker-news.firebaseio.com/v0/topstories.json
r = requests.get(url)
print("Status code:",r.status_code)

# 处理有关每篇文章的信息
submission_ids = r.json()
submission_dicts = []
for submission_id in submission_ids[:30]:
    # 对于每篇文章,都执行一个API调用
    url = (https://hacker-news.firebaseio.com/v0/item/+
           str(submission_id)+.json)
    submission_r = requests.get(url)
    print(submission_r.status_code)
    response_dict = submission_r.json()

    submission_dict = {
        title:response_dict[title],
        link:http://news.ycombinator.com/item?id= + str(submission_id),
        comments:response_dict.get(descendants,0)
    }
    submission_dicts.append(submission_dict)
submission_dicts = sorted(submission_dicts,key=itemgetter(comments),
                          reverse=True)
for submission_dict in submission_dicts:
    print("\nTitle:",submission_dict[title])
    print("Discussion link:",submission_dict[link])
    print("Comments:",submission_dict[comments])

  dict.get(),它在指定的键存在时返回与之相关联的值,并在指定的键不存在时,返回你指定的值(这里是0)

D:\PycharmProject\Study\venv\Scripts\python.exe D:/data_visualization/hn_submission.py
Status code: 200
200
200
--ship--
Title: A "cure" for baldness could be around the corner
Discussion link: http://news.ycombinator.com/item?id=20531394
Comments: 231

Title: Square’s Growth Framework for Engineers and Engineering Managers
Discussion link: http://news.ycombinator.com/item?id=20530046
Comments: 204

Title: Photographers, Instagrammers: Stop Being So Selfish and Disrespectful
Discussion link: http://news.ycombinator.com/item?id=20530350
Comments: 161

Title: Photos and fingerprints of all EU citizens copied from the UK to the US
Discussion link: http://news.ycombinator.com/item?id=20533576
Comments: 108--ship--

Process finished with exit code 0

 

项目2 可视化数据(第17章 使用API)

上一篇:win下的常用8个命令


下一篇:vue-cli本地环境API代理设置和解决跨域