记得以前写爬虫的时候为了防止dns多次查询,是直接修改/etc/hosts文件的,最近看到一个优美的解决方案,修改后记录如下:
import socket _dnscache={} def _setDNSCache(): """ Makes a cached version of socket._getaddrinfo to avoid subsequent DNS requests. """ def _getaddrinfo(*args, **kwargs): global _dnscache if args in _dnscache: print str(args)+" in cache" return _dnscache[args] else: print str(args)+" not in cache" _dnscache[args] = socket._getaddrinfo(*args, **kwargs) return _dnscache[args] if not hasattr(socket, ‘_getaddrinfo‘): socket._getaddrinfo = socket.getaddrinfo socket.getaddrinfo = _getaddrinfo def test(): _setDNSCache() import urllib urllib.urlopen(‘http://www.baidu.com‘) urllib.urlopen(‘http://www.baidu.com‘) test()
结果如下:
(‘www.baidu.com‘, 80, 0, 1) not in cache (‘www.baidu.com‘, 80, 0, 1) in cache不过这个方案虽好,但也有缺陷,罗列如下:
1.相当于只对socket.getaddrinfo打了一个patch,但socket.gethostbyname,socket.gethostbyname_ex还是走之前的策略
2.只对本程序有效,而修改/etc/hosts将对所有程序有效,包括ping