1.安装pillow,pytesseract
pip install pillow
pip install pytesseract
2.识别验证码
def get_verifycode(self): ‘‘‘识别验证码‘‘‘ # 1.定位验证码位置及大小 verifycode_element = self.verifycode_image_element # 定位验证码 location = verifycode_element.location # 获取验证码x,y坐标 size = verifycode_element.size # 验证码高度、宽度、 zuobiao = ( int(location[‘x‘]), int(location[‘y‘]), int(location[‘x‘] + size[‘width‘]), int(location[‘y‘] + size[‘height‘])) # 2.截屏,在截屏中截取验证码位置,再次保存 image_name = self.save_screenshot() # 截屏 img = Image.open(image_name).crop(zuobiao) # 打开截图 img = img.convert(‘RGB‘) img.save(image_name) # 3.再次读取识别验证码 code = pytesseract.image_to_string(Image.open(image_name)) # 正则表达式去除空格或其他特殊符号 b = ‘‘ for i in code.strip(): # pattern = re.compile(r‘[a-zA-Z0-9]‘) pattern = re.compile(r‘[0-9]‘) # 由于本系统的验证码都是数字,所以正则匹配时,只验证数字 m = pattern.search(i) if m != None: b += i return b
3.pytesseract模块使用出现错误:tesseract is not installed or it‘s not in your path,处理方法:
1)下载tesseract-ocr:tesseract-ocr下载地址:https://github.com/tesseract-ocr/tesseract/wiki
2)安装tesseract-ocr:双击.exe文件安装,并记住安装路径
3)修改python安装路径中的pytesseract.py文件,将tesseract_cmd改为r‘F:\Program Files (x86)\Tesseract-OCR\tesseract.exe‘
文件路径:pyhton安装路径\Lib\site-packages\pytesseract\pytesseract.py