做题过程中需要目录遍历,就简单写了扫描脚本,但是还没学到多线程,就先这样吧,后续更新完整
import urllib.request as req
import urllib.error as er,sys
file_hou=['.tar','.tat.gz','.zip','.rar','.bak']
file_name=['web','website','backup','back','www','wwwroot','temp']
f=open("D:/lenovo/desktop/ctfhub.txt",'a',encoding=('utf-8'))
web_success=[]
n=(len(file_name)*len(file_hou))
nn=0
for x in file_name:
for y in file_hou:
try:
url = 'http://taobao.com'+x+y
webpage = req.urlopen(url) # 根据超链访问链接的网页
#print(url + " ok")
web_success.append(url)
except er.HTTPError as e:
#print(url + " ", e)#显示错误信息
pass
except er.URLError as f:
#print(url + " ", f)#显示错误信息
pass
except:
#print(sys.exc_info())#显示错误信息
raise
nn+=1
print("%.2f"%((nn/n)*100),"%")#进度显示
for x in web_success:
f.write(x)
注意:
- urllib2中的HTTPError、URLError异常处理都合并到了urllib中
- 如果要把结果存到文本中,要以encoding=('utf-8')编码打开文本,txt默认是gbk编码,pycharm也默认使用系统的编码