Some in urllib2 - python2.7

1. urlopen可以给一个Request Object返回一个response object,read()读取相应对象的内容,这时候的print(the_page)可以输出网页的html内容

1 import urllib2
2 
3 req = urllib2.Request(http://www.voidspace.org.uk)
4 response = urllib2.urlopen(req)
5 the_page = response.read()
6 
7 print(the_page)

 

2. Request对象可以给server传输数据,还可以传输一些额外信息(metadata),如HTTP"headers"

 

3.如我们所知request可以用POST方式给server传输数据,这些数据可以通过标准方式进行编码之后进行传输,这里用了urlencode函数进行编码

 1 import urllib2
 2 import urllib
 3 
 4 url = http://www.someserver.com/cgi-bin/register.cgi
 5 
 6 values = {name:Michael Foord,
 7           location: Northampton,
 8           language: Python
 9           }
10 
11 data = urllib.urlencode(values)
12 req = urllib2.Request(url, data)
13 response = urllib2.urlopen(req)
14 
15 the_page = response.read()

 

 

  当然也可以用GET模式来传输数据,默认没有加data参数的时候就是使用GET模式,实际上我们知道POST是将数据编码后打包发送,GET类似与将数据加在url的末尾进行传输

 1 import urllib2
 2 import urllib
 3 
 4 
 5 values = {name:Michael Foord,
 6           location: Northampton,
 7           language: Python
 8           }
 9 
10 data = urllib.urlencode(values)
11 print(data) # encoded data
12 
13 url = http://www.example.com/example.cgi
14 full_url = url + ? + data #use ‘?‘ to add data at the end
15 req = urllib2.Request(full_url)
16 response = urllib2.urlopen(req)
17 
18 the_page = response.read()
19 print(the_page)

 

4.Headers

  一些服务器只提供给浏览器访问,而上面的方式默认以名字python-urllib/2.7进行访问,所以需要将自己“伪装”成浏览器的名字

 

 1 import urllib
 2 import urllib2
 3 
 4 url = http://www.someserver.com/cgi-bin/register.cgi
 5 user_agent = Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)
 6 
 7 values = {name : Michael Foord,
 8 location : Northampton,
 9 language : Python }
10 
11 headers = { User-Agent : user_agent }
12 data = urllib.urlencode(values)
13 
14 req = urllib2.Request(url, data, headers)
15 response = urllib2.urlopen(req)
16 the_page = response.read()

 

 

5. URLError with a "reason" attribute

 1 import urllib
 2 import urllib2
 3 from urllib2 import URLError
 4 
 5 req = urllib2.Request(http://www.pretend_server.org)
 6 
 7 try:
 8     urllib2.urlopen(req)
 9 except URLError as e:
10     print e.reason

 

 

6. HTTPError with a "code" attribute, codes in the 100-299 range indicatesuccess, you will usually only see error codes in the 400-599 range.

 1 import urllib
 2 import urllib2
 3 from urllib2 import URLError
 4 
 5 req = urllib2.Request(http://www.python.org/fish.html)
 6 
 7 try:
 8     urllib2.urlopen(req)
 9 except urllib2.HTTPError as e:
10     print e.code
11     print e.read()

 

 

7. Two basic approaches

 1 #1
 2 from urllib2 import Request, urlopen, URLError, HTTPError
 3 
 4 req = Request(someurl)
 5 
 6 try:
 7     response = urlopen(req)
 8 except HTTPError as e:
 9     print The server couldn\‘t fulfill the request.
10     print Error code: , e.code
11 except URLError as e:
12     print We failed to reach a server.
13     print Reason: , e.reason
14 else:
15     print(everything is fine)
16 
17 #2
18 from urllib2 import Request, urlopen, URLError
19 
20 req = Request(someurl)
21 try:
22     response = urlopen(req)
23 except URLError as e:
24     if hasattr(e, reason):
25         print We failed to reach a server.
26         print Reason: , e.reason
27     elif hasattr(e, code):
28         print The server couldn\‘t fulfill the request.
29         print Error code: , e.code
30 else:
31     # everything is fine

 

8. Basic Authentication
  当需要认证的时候,服务器会发出一个header来请求认证,如WWW-Authenticate: Basic realm="cPanel Users",然后用户可以把用户名和密码作为一个header加在requese中再次请求.
一般不需要考虑格式范围的话可以直接用HTTPPasswordMgrWithDefaultRealm来设定某个URL的用户和密码

 1 from urllib2 import Request, urlopen, URLError
 2 import urllib2
 3 
 4 #create a password manager
 5 password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
 6 
 7 username = Prime
 8 password = Bee
 9 
10 top_level_url = "http://example.com/foo/"
11 password_mgr.add_password(None, top_level_url, username, password)
12 
13 handler = urllib2.HTTPBasicAuthHandler(password_mgr)
14 
15 opener = urllib2.build_opener(handler)
16 opener.open(someurl)
17 
18 # Install the opener, not necessarily
19 urllib2.install_opener(opener)

 

 

9. 设置socket的默认等待时间

1 import socket
2 
3 timeout = 10
4 socket.setdefaulttimeout(timeout)

 

Some in urllib2 - python2.7,布布扣,bubuko.com

Some in urllib2 - python2.7

上一篇:关于javascript中的 执行上下文和对象变量


下一篇:分析nginx大日志文件,python多线程必备! .