优酷电视剧爬虫代码实现一:下载解析视频网站页面(4)补充: Java正则表达式Matcher.group(int group)相关类解析

在Java正则表达式的相关类Matcher中,有如下几个方法: 
- int groupCount() 
- String group(int group) 
- int start(int group) 
- int end(int group) 
- String group(String name) 
- int start(String name) 
- int end(String name)

分组group的概念

首先先来看一段代码,理解一下正则表达式中分组的概念

demo1

String text = "John writes about this, and John writes about that," + " and John writes about everything. ";
String patternString1 = "(John)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
System.out.println("groupCount is -->" + matcher.groupCount());
while (matcher.find()) {
System.out.println("found: " + matcher.group(1));
}

输出结果为

groupCount is –>1 
found: John 
found: John 
found: John

Demo2:

String text = "John writes about this, and John writes about that," + " and John writes about everything. ";
String patternString1 = "John";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
System.out.println("groupCount is -->" + matcher.groupCount());
while (matcher.find()) {
System.out.println("found: " + matcher.group(1));
}

输出结果为:

groupCount is –>0 
Exception in thread “main” java.lang.IndexOutOfBoundsException: No group 1

上面两个例子唯一的区别在于patternString1的值不同,具体表现正则表达式一个带有括号,一个不带括号.因此,我们也可以简单的理解为:

正则表达式中以’()’标记的子表达式所匹配的内容就是一个分组(group).

现在我们继续看一个例子 
Demo3

String text = "John writes about this, and John writes about that," + " and John writes about everything. ";
String patternString1 = "(?:John)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
System.out.println("groupCount is -->" + matcher.groupCount());
while (matcher.find()) {
System.out.println("found: " + matcher.group(1));
}

输出结果:

groupCount is –>0 
Exception in thread “main” java.lang.IndexOutOfBoundsException: No group 1

从demo3中可以看到,类似于(?:pattern)格式的子表达式不能算是一个分组.

因此分组的概念我们总结如下: 
1. 正则表达式中以’()’标记的子表达式所匹配的内容就是一个分组(group). 
2. 类似于(?:pattern)格式的子表达式不能算是一个分组

分组索引 group number

还是从demo开始 
demo4

String text = "John writes about this, and John Doe writes about that,"
+ " and John Wayne writes about everything.";
String patternString1 = "(John) (.+?) ";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
matcher.find();//匹配字符串,匹配到的字符串可以在任何位置
int start = matcher.start();//返回当前匹配到的字符串在原目标字符串中的位置
int end = matcher.end();//返回当前匹配的字符串的最后一个字符在原目标字符串中的索引位置
System.out.println("found group: group(0) is '" + matcher.group(0));
System.out.println("found group: group(1) is '" + matcher.group(1) + "',group(2) is '" + matcher.group(2)+"'");

输出结果为:

found group: group(0) is ‘John writes 
found group: group(1) is ‘John’,group(2) is ‘writes’

从输出结果可以看出,当正则表达式包含多个group时,也就是含有多个’(pattern)’格式的子表达式时,它的分组索引(group number)是从1开始的,而group(0)代表了整个匹配的字符串.

通过上面的内容,我们就可以完整理解group(int group)函数的使用.总结为一下几点:

1.int start(int group) 返回当前分组匹配到的字符串在原目标字符串中的位置

2.int end(int group) 返回当前分组匹配的字符串的最后一个字符在原目标字符串中的索引位置.

上一篇:门户网站架构Nginx+Apache+MySQL+PHP+Memcached+Squid


下一篇:saltstack mysql returner