C++正则表达式

2021-09-10 02:01:58

C++正则表达式

今天是到学校的第二天，闲来无事就准备学点什么东西。然后就想到的这个正则表达式。

1.了解

正则表达式在工作中会经常用到。C/C++又不像其他高级语言如Java,C#等会自带正则表达式。C/C++标准库中没有自带，需要另外加入正则库。不过正由于经常用到，在Linux下安装完开发库后会有自带三种正则表达式（C regex，C++ regex，boost regex）

2.下载

如果没有boost开发库，请自行下载安装http://sourceforge.net/projects/boost/files/boost/1.55.0/

3.分析

(1) C regex

首先是编译正则表达式

int regcomp(regex_t *preg, const char *regex, int cflags);

regcomp()函数是用于吧正则表达式编译成某种格式，可以使后面的匹配更有效。

preg: regex_t结构体用于存放编译后的正则表达式

regex: 指向正则表达式指针

cflags: 编译模式

共有如下四种编译模式

REG_EXTENDED: 使用功能更加签到的扩展正则表达式

REG_ICASE: 忽略大小写

REG_NOSUB: 不用存储匹配后的结果

REG_NEWLINE: 识别换行符，这样‘$’就可以从行尾开始匹配，‘^’就可以从行的开头开始匹配。否则忽略换行符，把整个文本串当做一个字符串处理

其次是执行匹配

int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags);

preg：已编译的正则表达式指针；

string：目标字符串；

nmatch:pmatch数组的长度；

pmatch：结构体数组，存放匹配文本串的位置信息；

eflags：匹配模式

共两种匹配模式：

REG_NOTBOL：The match-beginning-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above). This flag may be used when different portions of a string are passed to regexec and the beginning of the string should not be interpreted as the beginning of the line.

REG_NOTEOL:The match-end-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above)

typedef struct {

　　regoff_t rm_so;

　　regoff_t rm_eo;

} regmatch_t;

　　其中rm_so表示满足规则的子串在string中的起始偏移量，rm_eo表示满足规则的子串在string中的后续偏移量。当regexec成功返回时，从pmatch[0].rm_so到pmatch[0].rm_eo是第一个匹配的字符串。

最后，释放内存

void regfree(regex_t *preg); 当使用完编译好的正则表达式后，或者需要重新编译其他正则表达式时，一定要使用这个函数清空该变量。

其他，处理错误
size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);
当执行regcomp 或者regexec 产生错误的时候，就可以调用这个函数而返回一个包含错误信息的字符串。

errcode：由regcomp 和 regexec 函数返回的错误代号。
preg：已经用regcomp函数编译好的正则表达式，这个值可以为NULL。
errbuf：指向用来存放错误信息的字符串的内存空间。
errbuf_size：指明buffer的长度，如果这个错误信息的长度大于这个值，则regerror 函数会自动截断超出的字符串，但他仍然会返回完整的字符串的长度。所以我们可以用如下的方法先得到错误字符串的长度。

例子

 1 #include <regex.h>
 2 #include <stdio.h>
 3 #include <sys/time.h>
 4 #include <string.h>
 5 
 6 
 7 int main()
 8 {
 9     char pattern[512]="wunaozai";
10     //char pattern[512]="[0-9]{4}";// "[[:digit:]]+"
11     const size_t nmatch=10;
12     regmatch_t pm[10];
13     int z;
14     regex_t reg;
15     char buf[256]="admin,cnblogs.com::wunaozaiskdjf::wunao zai::llllsjd9843*&(";
16     regcomp(&reg,pattern,REG_EXTENDED);
17     z=regexec(&reg,buf,nmatch,pm,REG_NOTBOL);
18     if(z==REG_NOMATCH)
19         printf("匹配不到!\n");
20     else
21     {
22 　　　　　　printf("匹配到的字符是:");
23 　　　　　　for(int i=pm[0].rm_so;i<pm[0].rm_eo;i++)
24 　　　　　　{
25 　　　　　　    printf("%c",buf[i]);
26 　　　　　　}
27 　　　　　　printf("\n");
28     }
29     regfree(&reg);
30     return 0;
31 }
32

(2) C++ regex

...

(3) Boost.regex

 1 #include <boost/regex.hpp>
 2 #include <stdio.h>
 3 #include <string>
 4 #include <iostream>
 5 
 6 using namespace std;
 7 
 8 int main()
 9 {
10     boost::regex pattern("[[:digit:]]+");//提供正则表达式
11     string buf="admin,cnblogs.com:ww:wunaozaiskdjf::wunao zai::llllsjd9843*&(";//模式串
12     boost::smatch mat;//把匹配到的放到mat中
13     bool valid=boost::regex_search(buf,mat,pattern);//进行匹配
14     cout<<"有"<<mat.size()<<"个"<<endl;
15     if(valid==false)
16 　　　　printf("匹配不到!\n");
17     else
18     {
19 　　　　printf("匹配到的字符有:");
20 　　　　for(size_t i=0;i<mat.size();i++)
21 　　　　{
22 　　　　    cout<<mat[i].str()<<endl;
23 　　　　}
24     }
25     return 0;
26 }

注意在编译的时候要加入动态库

完整的命令行下编译为
g++ -std=c++0x -lboost_regex main.cpp -o a.out

4.参考资料

http://www.coder4.com/archives/3796

http://www.cnblogs.com/pmars/archive/2012/10/24/2736831.html

http://blog.sina.com.cn/s/blog_48d5933f0100o8np.html

C++正则表达式

码农公寓

相关文章