Python – 在本地保存请求或BeautifulSoup对象

我有一些代码很长,所以运行需要很长时间.我想在本地保存请求对象(在本例中为“name”)或BeautifulSoup对象(在本例中为“soup”),以便下次我可以节省时间.这是代码:

from bs4 import BeautifulSoup
import requests

url = 'SOMEURL'
name = requests.get(url)
soup = BeautifulSoup(name.content)

解决方法:

由于name.content只是HTML,因此您可以将其转储到文件中并稍后再读取.

通常,瓶颈不是解析,而是发出请求的网络延迟.

from bs4 import BeautifulSoup
import requests

url = 'https://google.com'
name = requests.get(url)

with open("/tmp/A.html", "w") as f:
  f.write(name.content)


# read it back in
with open("/tmp/A.html") as f:
  soup = BeautifulSoup(f)
  # do something with soup

以下是瓶颈存在于网络中的一些轶事证据.

from bs4 import BeautifulSoup
import requests
import time

url = 'https://google.com'

t1 = time.clock();
name = requests.get(url)
t2 = time.clock();
soup = BeautifulSoup(name.content)
t3 = time.clock();

print t2 - t1, t3 - t2

输出来自Thinkpad X1 Carbon,具有快速的校园网络.

0.11 0.02
上一篇:如何在Java中屏蔽Ajax站点?


下一篇:如何将数据输入网页以使用Python刮取结果输出?