爬虫
python3爬取网页资源方式(1.最简单:
- import'http://www.baidu.com/'print2.通过request
- import'http://www.baidu.com'print1.import urllib.request
'wd''python''opt-webpage''on''ie''gbk'GET和POST请求的不同之处是POST请求通常有"副作用"'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)''User-Agent'
import urllib.requestfrom urllib.error import URLError ,HTTPErrorreq=urllib.request.Request('http://www.baidu.com')try:urllib.request.urlopen(req)except URLError as e:print(e.reason)HTTPError
1.Openers:2.Handles:import urllib.requestpassword_mgr=urllib.request.HTTPPasswordMgrWithDefaultRealm()top_level_url="http://example.com/foo/"password_mgr.add_password(None,top_level_url,'why','1223')handler=urllib.request.HTTPBasicAuthHandler(password_mgr)opener=urllib.request.build_opener(handler)a_url='http://www.baidu.com/'opener.open(a_url)urllib.request.install_opener(opener)后者包含了端口号。
- import'http://www.baidu.com'print1.import urllib.request