Python_正则表达式二

 '''
正则表达式对象的sub(repl,string[,count=0])和subn(repl,string[,count=0])方法用来实现字符串替换功能
'''
example='''Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better tha complex.
Complext is better than nested.
Sparse is better than dense.
Readability counts.
'''
pattern = re.compile(r'\bb\w*\b',re.I) #正则表达式对象,匹配以b或B开头的单词
print(pattern.sub('*',example)) #将符合条件的单词替换为*
# * is * than ugly.
# Explicit is * than implicit.
# Simple is * tha complex.
# Complext is * than nested.
# Sparse is * than dense.
# Readability counts.
print(pattern.sub('*',example,1)) #只替换1次
# * is better than ugly.
# Explicit is better than implicit.
# Simple is better tha complex.
# Complext is better than nested.
# Sparse is better than dense.
# Readability counts.
print(re.compile(r'\bb\w*\b')) #匹配以字母b开头的单词
print(pattern.sub('*',example,1)) #将符合条件的单词替换为*,只替换1次
# * is better than ugly.
# Explicit is better than implicit.
# Simple is better tha complex.
# Complext is better than nested.
# Sparse is better than dense.
# Readability counts.
'''
正则表达式对象呢的split(strign[,maxsplit = 0])方法用来实现字符串分隔.
'''
example = r'one,two,three.four/five\six?seven[eight]nine|ten'
pattern = re.compile(r'[,./\\?[\]\|]') #指定多个可能的分隔符
print(pattern.split(example))
# ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
example = r'one1two2three3four4five5six6seven7enght8nine9ten'
pattern=re.compile(r'\d+')
print(pattern.split(example))
# ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'enght', 'nine', 'ten']
example = r'one two three four,five.six.seven,enght,nine9ten'
pattern=re.compile(r'[\s,.\d]+') #允许分隔符重复
print(pattern.split(example))
['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'enght', 'nine', 'ten'] '''
match对象:
正则表达式模块或正则表达式对象的match()方能发和search()方法匹配成功后都会返回math()对象。match对象的主要方法有grou()(返回匹配的
一个或多个子模式内容)、groups()(返回一个包含匹配的所有子模式内容的元组)、groupdict()(返回包含匹配的所有命名子模式内容字典)、start()
(返回指定子模式内容的起始位置)、end()(返回指定子模式内容的结束位置的前一个位置)、span()(返回一个包含指定子模式内容起始位置和结束前一个位置
的元组)等。下面的代码使用几种不同的方法来删除字符串中指定的内容:
'''
email='tony@tiremove_thisger.net'
m=re.search('remove_this',email) #使用search()方法返回的match对象
print(email[:m.start()]+email[m.end()]) #字符串切片
print(re.sub('remove_this','',email)) #直接使用re模块的sub()方法
# tony@tiger.net
print(email.replace('remove_this','')) #也可以直接使用字符串替换方法
# tony@tiger.net m=re.match(r"(\w+)(\w+)","Isaac Newton,physicist")
print(m.group(0)) #返回整个模式内容
# Isaac
print(m.group(1)) #返回第一个子模式内容
# Isaa
print(m.group(2))
# c
print(m.group(1,2))
# ('Isaa', 'c') '''
下面的代码演示了子模式扩展语法的用法
'''
m=re.match(r"(?P<first_name>\w+)(?P<last_name>\w+)","Malcolm Reynolds")
print(m.group('first_name')) #使用命名的子模式
# Malcolm
print(m.group('last_name'))
# m
m=re.match(r'(\d+)\.(\d+)','24.1632')
print(m.groups()) #返回所有匹配的子模式(不包括第0个)
# ('24', '1632')
m=re.match(r'(?P<first_name>\w+)(?P<last_name>\w+)','Malcolm Reynolds')
print(m.groupdict()) #以字典形式返回匹配的结果
# {'first_name': 'Malcol', 'last_name': 'm'}
exampleString = '''There should be one-and preferably only one-obvious way to do it.
Although that way may not be obvioud at first unless you're Dutch.
Now is better than never.
Athought never is often better than right now.
'''
pattern =re.compile(r'(?<=\w\s)never(?=\s\w)') #查找不在橘子开头和结尾的never
matchResult = pattern.search(exampleString)
print(matchResult.span())
# (168, 173)
pattern =re.compile(r'(?<=\w\s)never') #查找位于句子末尾的单词
mathResult=pattern.search(exampleString)
print(mathResult.span())
# (152, 157) pattern=re.compile(r'(?:is\s)better(\sthan)') #查找前面是is的better than组合
matchResult=pattern.search(exampleString)
print(matchResult.span())
# (137, 151)
print(matchResult.group(0))
# is better than
print(matchResult.group(1))
# than
pattern=re.compile(r'\b(?i)n\w+\b') #查找以n或N字母开头的所有单词
index=0
while True:
matchResult=pattern.search(exampleString,index)
if not matchResult:
break
print(matchResult.group(0),':',matchResult.span(0))
index=matchResult.end(0)
# not : (88, 91)
# Now : (133, 136)
# never : (152, 157)
# never : (168, 173)
# now : (201, 204)
pattern=re.compile(r'(?<!not\s)be\b') #查找前面没有单词not的单词be
index=0
while True:
matchResult=pattern.search(exampleString,index)
if not matchResult:
break
print(matchResult.group(0),':',matchResult.span(0))
index=matchResult.end(0)
# be : (13, 15)
print(exampleString[13:20] ) #验证一下结果是否准确
# be one-
pattern=re.compile(r'(\b\w*)(?P<f>\w+)(?P=f)\w*\b') #匹配有连续想念痛字母的单词
index = 0
while True:
matchResult=pattern.search(exampleString,index)
if not matchResult:
break
print(matchResult.group(0),':',matchResult.group(2))
index=matchResult.end(0)+1
# unless : s
# better : t
# better : t
print(s)
# aaa bb c d e fff
p=re.compile(r'(\b\w*(?P<f>\w+)(?P=f)\w*\b)')
print(p.findall(s))
[('aaa', 'a'), ('bb', 'b'), ('fff', 'f')]
上一篇:Android App的架构设计:从VM、MVC、MVP到MVVM


下一篇:acdream.LCM Challenge(数学推导)