正则表达式的概念与作用
概念
正则表达式是一种字符串匹配的模式
re.findall()方法
re.findall(pattern,string,flags=0)(重点)
作用:扫描整个string字符串,返回所有与pattern匹配的列表
参数:
? pattern:正则表达式
? string:从那个字符串中查找
? flags:匹配模式
举例: re.findall("\d","chuan1zhi2")>>["1","2"]
案例
import re
# findall方法,返回匹配的结果列表
rs = re.findall(‘\d+‘,‘chuan13zhi24‘)
print(rs)
# findall方法,flag参数的作用
rs1 = re.findall(‘a.bc‘,‘a\nbc‘,re.DOTALL)
rs2 = re.findall(‘a.bc‘,‘a\nbc‘,re.S)
print(rs1)
print(rs1)
# finfall方法中分组的使用
rs3 = re.findall(‘a.+bc‘,‘a\nbc‘,re.DOTALL)
print(rs3)
rs4 = re.findall(‘a(.+)bc‘,‘a\nbc‘,re.DOTALL) #使用分组
print(rs4)
运行结果
[‘13‘, ‘24‘]
[‘a\nbc‘]
[‘a\nbc‘]
[‘a\nbc‘]
[‘\n‘]
正则表达式中r原串的使用
正则中使用r原始字符串,能够忽略转义符号带来的影响
待匹配的字符串中有多少个,r原串正则中就添加多少个\即可
案例
import re
rs0 = re.findall("a\nb","a\nb")
print(rs0)
rs1 = re.findall("a\\nb","a\\nb")
print(rs1)
rs2 = re.findall("a\\\\nb","a\\nb")
print(rs2)
rs3 = re.findall(r"a\\nb","a\\nb")
print(rs3)
运行结果
[‘a\nb‘]
[]
[‘a\\nb‘]
[‘a\\nb‘]
案例--提取最新的疫情数据的json字符串
代码
#1,导入相关模块
import requests
from bs4 import BeautifulSoup
import re
#2,发送请求,获取疫情首页内容
response = requests.get(‘https://ncov.dxy.cn/ncovh5/view/pneumonia‘)
home_page = response.content.decode()
# print(home_page) #测试一下
#3,使用BeautifulSoup提取疫情数据
soup = BeautifulSoup(home_page,‘lxml‘)
script = soup.find(id=‘getListByCountryTypeService2true‘)
countries_text = script.string
#4,提取json字符串
json_str = re.findall(r‘(\[.*\])‘,countries_text)
print(json_str)
结果
[‘[{"id":10409664,"createTime":1629769854000,"modifyTime":1629769854000,"tags":"","countryType":2,"continents":"北美洲","provinceId":"8","provinceName":"美国","provinceShortName":"","cityName":"","currentConfirmedCount":6734506,"confirmedCount":37932709,"confirmedCountRank":1,"suspectedCount":0,"curedCount":30568819,"deadCount":629384,"deadCountRank":1,"deadRate":"1.65","deadRateRank":96,"comment":"","sort":0,"operator":"chengxinzhe1","locationId":971002,"countryShortCode":"USA","countryFullName":"United States of America","statisticsData":"https://file1.dxycdn.com/2020/0315/553/3402160512808052518-135.json","incrVo":{"currentConfirmedIncr":124655,"confirmedIncr":221550,"curedIncr":96015,"deadIncr":880},"showRank":true,"yesterdayConfirmedCount":2147383647,"yesterdayLocalConfirmedCount":2147383647,......