正则表达式

1 正则表达式
1.1 符号
1.1.1 行定位符
^ 行开始
$ 行结尾

1.1.2 元字符
. 匹配除换行符以外的任意字符
\w 匹配字母、数字、下划线、汉字 [a-z0-9A-Z_]
\W 匹配除字母、数字、下划线、汉字以外的字符
\s 匹配单个空白符,包括Tab键、Enter键
\S 匹配除单个空白符(Tab、Enter)外的所有字符
\b 匹配单词的开始或结束,即匹配空格、标点符号、换行符
\d 匹配数字 [0-9]

The special sequences consist of "\\" and a character from the list
below.  If the ordinary character is not on the list, then the
resulting RE will match the second character.
    \number  Matches the contents of the group of the same number.
    \A       Matches only at the start of the string.
    \Z       Matches only at the end of the string.
    \b       Matches the empty string, but only at the start or end of a word.
    \B       Matches the empty string, but not at the start or end of a word.
    \d       Matches any decimal digit; equivalent to the set [0-9] in
             bytes patterns or string patterns with the ASCII flag.
             In string patterns without the ASCII flag, it will match the whole
             range of Unicode digits.
    \D       Matches any non-digit character; equivalent to [^\d].
    \s       Matches any whitespace character; equivalent to [ \t\n\r\f\v] in
             bytes patterns or string patterns with the ASCII flag.
             In string patterns without the ASCII flag, it will match the whole
             range of Unicode whitespace characters.
    \S       Matches any non-whitespace character; equivalent to [^\s].
    \w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]
             in bytes patterns or string patterns with the ASCII flag.
             In string patterns without the ASCII flag, it will match the
             range of Unicode alphanumeric characters (letters plus digits
             plus underscore).
             With LOCALE, it will match the set [0-9_] plus characters defined
             as letters for the current locale.
    \W       Matches the complement of \w.
    \\       Matches a literal backslash.

1.1.3 限定符
? 匹配前面的字符零次或一次

  • 匹配前面的字符一次或多次
  • 匹配前面的字符零次或多次
    {n} 匹配前面的字符n次
    {n,} 匹配前面的字符至少n次
    {n,m} 匹配前面的字符至少n次,最多m次

1.1.4 排除字符
^ 行的开始
[^] 方括号中,表示排除

1.1.5 选择字符
? | : 竖线表示或
1.1.6 转义字符
? \ : 转义,例如 . 表示“.”

1.1.7 分组
? () 小括号表示分组
1.2 方法
? import re 引入正则模块
1.2.1 flag
A 或 ASCII 对于\w、\W、\b、\B、\d、\D、\s、\S只能进行ASCII匹配(Python 3.x)
I 或 IGNORECASE 匹配时不区分大小写字母
M 或 MULTILINE ^$之间的每一行(多行)
S 或 DOTALL 使用(.)字符匹配所有字符,包含换行符
X 或 VERBOSE 忽略模式字符串中未转义的空格和注释

Some of the functions in this module takes flags as optional parameters:
    A  ASCII       For string patterns, make \w, \W, \b, \B, \d, \D
                   match the corresponding ASCII character categories
                   (rather than the whole Unicode categories, which is the
                   default).
                   For bytes patterns, this flag is the only available
                   behaviour and needn‘t be specified.
    I  IGNORECASE  Perform case-insensitive matching.
    L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
    M  MULTILINE   "^" matches the beginning of lines (after a newline)
                   as well as the string.
                   "$" matches the end of lines (before a newline) as well
                   as the end of the string.
    S  DOTALL      "." matches any character at all, including the newline.
    X  VERBOSE     Ignore whitespace and comments for nicer looking RE‘s.
    U  UNICODE     For compatibility only. Ignored for string patterns (it
                   is the default), and forbidden for bytes patterns.

1.2.2 match()
? re.match(pattern,str,[flag]) 成功返回match对象,匹配失败返回None

import re

re.match(pattern,str,[flag])

pattern = r‘al\w+‘

str_=‘alibaba alimama ali‘

match = re.match(pattern,str_,re.I)

print(match)
print(match.start())
print(match.end())
print(match.span())
print(match.string)
print(match.group())

<re.Match object; span=(0, 7), match=‘alibaba‘>
0
7
(0, 7)
alibaba alimama ali
alibaba

1.2.3 search()
? re.search(pattern,str,[flag]) 成功返回match对象,匹配失败返回None

import re

re.search(pattern,str,[flag])

pattern = r‘al\w+‘

str_=‘alibaba alimama ali‘

match = re.search(pattern,str_,re.I)
print(match)
print(match.start())
print(match.end())
print(match.span())
print(match.string)
print(match.group())

<re.Match object; span=(0, 7), match=‘alibaba‘>
0
7
(0, 7)
alibaba alimama ali
alibaba

1.2.4 findall()
? re.findall(pattern,str_name,[flags]) 匹配成功返回匹配结构的列表,不成功返回空列表

re.findall(pattern,str_name,[flags])

str_=‘alibaba alimama ali apple as jason‘

pattern=r‘a‘
match = re.findall(pattern,str_,re.I)
print(match)

pattern=r‘ali‘
match = re.findall(pattern,str_,re.I)
print(match)

pattern=r‘(a)‘
match = re.findall(pattern,str_,re.I)
print(match)

pattern=r‘(a)|(ap)‘
match = re.findall(pattern,str_,re.I)
print(match)

pattern=r‘(ali)|(ma)|(as)‘
match = re.findall(pattern,str_,re.I)
print(match)

1.2.5 替换字符
? re.sub(pattern,repl,str_target,count[,flags]) 返回替换后的字符串
? count大于匹配个数,则全部替换,不报错
? import re
? re.I 不区分大小写

import re

re.sub(pattern,repl,str,count,flags)

pattern = r‘ali‘

str_=‘alibaba alimama ALL Ali apple‘

result = re.sub(pattern,‘E‘,str_,2)
print(result)

result = re.sub(pattern,‘E‘,str_,1)
print(result)

result = re.sub(pattern,‘E‘,str_,9)
print(result)

pattern = r‘a‘
result = re.sub(pattern,‘E‘,str_,7)
print(result)
print(‘=============‘)
pattern = r‘ali‘
result = re.sub(pattern,‘E‘,str_,2,re.I)
print(result)

result = re.sub(pattern,‘E‘,str_,3,re.I)
print(result)

result = re.sub(pattern,‘E‘,str_,9,re.I)
print(result)

pattern = r‘a‘
result = re.sub(pattern,‘E‘,str_,7,re.I)
print(result)

Ebaba Emama ALL Ali apple
Ebaba alimama ALL Ali apple
Ebaba Emama ALL Ali apple
ElibEbE ElimEmE ALL Ali Epple

Ebaba Emama ALL Ali apple
Ebaba Emama ALL E apple
Ebaba Emama ALL E apple
ElibEbE ElimEmE ELL Ali apple

1.2.6 分割字符
split(pattern, string, maxsplit=0, flags=0)
Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings. If
capturing parentheses are used in pattern, then the text of all
groups in the pattern are also returned as part of the resulting
list. If maxsplit is nonzero, at most maxsplit splits occur,
and the remainder of the string is returned as the final element
of the list.

? re.split(pattern, string) 最大化拆分
? re.split(pattern, string, 0, re.I)最大化拆分
? maxsplit=0,最大化拆分
? re.split(pattern, string, maxsplit, re.I) 添加flags参数一般要加maxsplit

import re

re.split(pattern,str[,max_split][,flags])

str_=‘alibaba alimama ALL Ali apple‘

pattern = r‘a‘

result = re.split(pattern,str_)
print(result)
result = re.split(pattern,str_,re.I)
print(result)

result = re.split(pattern,str_,7)
print(result)
result = re.split(pattern,str_,7,re.I)
print(result)

result = re.split(pattern,str_,0)
print(result)
result = re.split(pattern,str_,0,re.I)
print(result)

1.3 eg例子
任意一个汉字 [\u4e00-\u9fa5]
任意多个汉字 [\u4e00-\u9fa5]+
不是字母 [^a-zA-Z]
身份证 (\d{15}$)|(^\d{18}$)|(\d{17})(\d|X|x)$

正则表达式

上一篇:配置Mikrotik ros软路由的日志保存至rsyslog服务器


下一篇:second