我有一些代码很长,所以运行需要很长时间.我想在本地保存请求对象(在本例中为“name”)或BeautifulSoup对象(在本例中为“soup”),以便下次我可以节省时间.这是代码:
from bs4 import BeautifulSoup
import requests
url = 'SOMEURL'
name = requests.get(url)
soup = BeautifulSoup(name.content)
解决方法:
由于name.content只是HTML,因此您可以将其转储到文件中并稍后再读取.
通常,瓶颈不是解析,而是发出请求的网络延迟.
from bs4 import BeautifulSoup
import requests
url = 'https://google.com'
name = requests.get(url)
with open("/tmp/A.html", "w") as f:
f.write(name.content)
# read it back in
with open("/tmp/A.html") as f:
soup = BeautifulSoup(f)
# do something with soup
以下是瓶颈存在于网络中的一些轶事证据.
from bs4 import BeautifulSoup
import requests
import time
url = 'https://google.com'
t1 = time.clock();
name = requests.get(url)
t2 = time.clock();
soup = BeautifulSoup(name.content)
t3 = time.clock();
print t2 - t1, t3 - t2
输出来自Thinkpad X1 Carbon,具有快速的校园网络.
0.11 0.02