网上找了很多文章,都去不掉script,应该是正则有问题。本人正则不行,最后还是使用beautifulsoup。
from bs4 import BeautifulSoup #html是获取的html源码 soup = BeautifulSoup(html,"lxml") [script.extract() for script in soup.findAll('script')] [style.extract() for style in soup.findAll('style')] print(soup.get_text())