python3爬取网页

2024-02-14 16:33:34

爬虫

python3爬取网页资源方式(1.最简单：

import'http://www.baidu.com/'print2.通过request
1. import'http://www.baidu.com'print1.import urllib.request
  'wd''python''opt-webpage''on''ie''gbk'GET和POST请求的不同之处是POST请求通常有"副作用"
  
  'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)''User-Agent'
  
  import urllib.request
  
  from urllib.error import URLError ,HTTPError
  
  req=urllib.request.Request('http://www.baidu.com')
  
  try:urllib.request.urlopen(req)
  
  except URLError as e:
  
  print(e.reason)
  
  HTTPError
  
  1.Openers：
  
  2.Handles：
  
  import urllib.request
  
  password_mgr=urllib.request.HTTPPasswordMgrWithDefaultRealm()
  
  top_level_url="http://example.com/foo/"
  
  password_mgr.add_password(None,top_level_url,'why','1223')
  
  handler=urllib.request.HTTPBasicAuthHandler(password_mgr)
  
  opener=urllib.request.build_opener(handler)
  
  a_url='http://www.baidu.com/'
  
  opener.open(a_url)
  
  urllib.request.install_opener(opener)
  
  后者包含了端口号。