正则表达式是字符串的处理利器。
用途:字符串匹配(字符匹配)、字符串查找、字符串替换
例如:IP地址是否正确、从网页中揪出email地址(如垃圾邮件)、从网页中揪出链接等
涉及到的类:java.lang.String, java.util.regex.Pattern, java.util.regex.Matcher
例1:Pattern是模式,Matcher是与模式匹配后的结果。
典型的调用顺序是
Pattern p = Pattern.compile
("a*b"); Matcher m = p.matcher
("aaaaab"); boolean b = m.matches
();
import java.util.regex.*; public class Test{ public static void main(String args[]){ System.out.println("abc".matches("...")); System.out.println("a3435f".replaceAll("\\d","-")); Pattern p = Pattern.compile("[a-z]{3}"); Matcher m = p.matcher("fgh"); System.out.println(m.matches()); System.out.println("fgha".matches("[a-z]{3}")); } }
输出:
true
a----f
true
false
例2:
X? | X,一次或一次也没有 |
X* | X,零次或多次 |
X+ | X,一次或多次 |
X{n} | X,恰好 n 次 |
X{n,} | X,至少 n 次 |
X{n,m} | X,至少 n 次,但是不超过 m 次 |
import java.util.regex.*; public class Test{ public static void main(String args[]){ //?={0,1}, *={0,}, +={1,} System.out.println("a".matches(".")); System.out.println("aa".matches("aa")); System.out.println("aaaa".matches("a*")); System.out.println("aaaa".matches("a+")); System.out.println("aaaa".matches("a?")); //false System.out.println("".matches("a*")); System.out.println("".matches("a?")); System.out.println("a".matches("a?")); System.out.println("2455668678".matches("\\d{3,100}")); System.out.println("192.168.0.aaa".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));//false System.out.println("192".matches("[0-2][0-9][0-9]")); } }
例3:[]代表其中任何一个字符,[^]代表除这些以外的一个字符
[abc] | a、b 或 c(简单类) |
[^abc] | 任何字符,除了 a、b 或 c(否定) |
[a-zA-Z] | a 到 z 或 A 到 Z,两头的字母包括在内(范围) |
[a-d[m-p]] | a 到 d 或 m 到 p:[a-dm-p](并集) |
[a-z&&[def]] | d、e 或 f(交集) |
[a-z&&[^bc]] | a 到 z,除了 b 和 c:[ad-z](减去) |
[a-z&&[^m-p]] | a 到 z,而非 m 到 p:[a-lq-z](减去) |
import java.util.regex.*; public class Test{ public static void main(String args[]){ System.out.println("a".matches("[abc]")); System.out.println("a".matches("[^abc]")); //除abc false System.out.println("A".matches("[a-zA-Z]")); System.out.println("A".matches("[a-z]|[A-Z]")); System.out.println("A".matches("[a-z[A-Z]]")); System.out.println("R".matches("[A-Z&&[RFG]]")); } }
例4:
\d | 数字:[0-9] |
\D | 非数字: [^0-9] |
\s | 空白字符:[ \t\n\x0B\f\r] |
\S | 非空白字符:[^\s] |
\w | 单词字符:[a-zA-Z_0-9] |
\W | 非单词字符:[^\w] |
import java.util.regex.*; public class Test{ public static void main(String args[]){ System.out.println(" \n\r\t".matches("\\s{4}")); System.out.println(" ".matches("\\S")); // false System.out.println("a_8".matches("\\w{3}")); System.out.println("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+")); System.out.println("\\".matches("\\\\")); } }
注意:正则表达式中,要匹配一个\,必须要用\\。而用字符串表示正则表达式时,正则表达式中的一个\就需要字符串中的两个\
例5:POSIX字符类(不常用)
\p{Lower} | 小写字母字符:[a-z] |
\p{Upper} | 大写字母字符:[A-Z] |
\p{ASCII} | 所有 ASCII:[\x00-\x7F] |
\p{Alpha} | 字母字符:[\p{Lower}\p{Upper}] |
\p{Digit} | 十进制数字:[0-9] |
\p{Alnum} | 字母数字字符:[\p{Alpha}\p{Digit}] |
\p{Punct} | 标点符号:!"#$%&‘()*+,-./:;<=>?@[\]^_`{|}~ |
\p{Graph} | 可见字符:[\p{Alnum}\p{Punct}] |
\p{Print} | 可打印字符:[\p{Graph}\x20] |
\p{Blank} | 空格或制表符:[ \t] |
\p{Cntrl} | 控制字符:[\x00-\x1F\x7F] |
\p{XDigit} | 十六进制数字:[0-9a-fA-F] |
\p{Space} | 空白字符:[ \t\n\x0B\f\r] |
import java.util.regex.*; public class Test{ public static void main(String args[]){ System.out.println("a".matches("\\p{Lower}")); } }
例6:边界匹配
^ | 行的开头 |
$ | 行的结尾 |
\b | 单词边界 |
\B | 非单词边界 |
\A | 输入的开头 |
\G | 上一个匹配的结尾 |
\Z | 输入的结尾,仅用于最后的结束符(如果有的话) |
\z | 输入的结尾 |
注:^在[]中是取反的意思,在[]外表示行的开头。
import java.util.regex.*; public class Test{ public static void main(String args[]){ System.out.println("hello sir".matches("^h.*")); System.out.println("hello sir".matches(".*ir$")); System.out.println("hello sir".matches("^h[a-z]{1,3}o\\b.*")); System.out.println("hellosir".matches("^h[a-z]{1,3}o\\b.*")); System.out.println(" \n".matches("^[\\s&&[^\\n]]*\\n$"));//空白行 } }
练习1:true or false?
true true true false
. | 任何字符(与行结束符可能匹配也可能不匹配) |
\d | 数字:[0-9] |
\D | 非数字: [^0-9] |
\s | 空白字符:[ \t\n\x0B\f\r] |
\S | 非空白字符:[^\s] |
\w | 单词字符:[a-zA-Z_0-9] |
\W | 非单词字符:[^\w] |
例7:matches find lookingAt
matches是匹配整个字符串,find是找子串,两者会相互影响,它们都会吃掉已经判断过的字符串。
find不必须从头开始匹配,只有找到匹配的就可以
lookingAt每次都从开头找
import java.util.regex.*; public class Test{ public static void main(String args[]){ String s = "123-34545-234-00"; Pattern p = Pattern.compile("\\d{3,5}"); Matcher m = p.matcher(s); System.out.println(m.matches());//false m.reset(); System.out.println(m.find()); System.out.println(m.find()); System.out.println(m.find()); System.out.println(m.find()); //false System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); } }
import java.util.regex.*; public class Test{ public static void main(String args[]){ String s = "123-34545-234-00"; Pattern p = Pattern.compile("\\d{3,5}"); Matcher m = p.matcher(s); System.out.println(m.matches());//false //m.reset(); System.out.println(m.find()); System.out.println(m.find()); System.out.println(m.find()); //false System.out.println(m.find()); //false System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); } }
例8;start end
import java.util.regex.*; public class Test{ public static void main(String args[]){ String s = "123-34545-234-00"; Pattern p = Pattern.compile("\\d{3,5}"); Matcher m = p.matcher(s); System.out.println(m.matches());//false m.reset(); System.out.println(m.find()); System.out.println(m.start()+"-"+m.end()); System.out.println(m.find()); System.out.println(m.start()+"-"+m.end()); System.out.println(m.find()); System.out.println(m.start()+"-"+m.end()); System.out.println(m.find()); //false System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); System.out.println(m.lookingAt()); } }
输出:
false
true
0-3
true
4-9
true
10-13
false
true
true
true
true
例9:替换
(1)
import java.util.regex.*; public class Test{ public static void main(String args[]){ Pattern p = Pattern.compile("java"); Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end"); while(m.find()){ System.out.println(m.group()); } } }
输出:
java
java
java
java
(2)
import java.util.regex.*; public class Test{ public static void main(String args[]){ Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE); Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end"); while(m.find()){ System.out.println(m.group()); } } }
输出:
java
Java
JAva
java
JAVA
java
java
(3)
import java.util.regex.*; public class Test{ public static void main(String args[]){ Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE); Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end"); System.out.println(m.replaceAll("JAVA")); } }
输出:
JAVA JAVA JAVA JAVA IloveJAVA YOUhateJAVAJAVA end
(4)
import java.util.regex.*; public class Test{ public static void main(String args[]){ Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE); Matcher m = p.matcher("java Java JAva java IloveJAVA YOUhatejavajava end"); StringBuffer buf = new StringBuffer(); int i = 0 ; while(m.find()){ i++; if(i%2 == 0){ m.appendReplacement(buf,"java"); }else{ m.appendReplacement(buf,"JAVA"); } } m.appendTail(buf); System.out.println(buf); } }
输出:
JAVA java JAVA java IloveJAVA YOUhatejavaJAVA end
例10:分组:标号是左小括号数。
import java.util.regex.*; public class Test{ public static void main(String args[]){ Pattern p = Pattern.compile("(\\d{3,5})([a-z]{2})"); String s = "123aa-34556bb-456cc-00"; Matcher m = p.matcher(s); while(m.find()){ System.out.println(m.group(1)); } } }
输出:
123
34556
456
如果是group(),则输出
123aa
34556bb
456cc
练习1:抓取网页中的email地址
import java.util.regex.*; import java.io.*; public class Test{ public static void main(String args[]){ try{ BufferedReader br = new BufferedReader(new FileReader("abc.htm")); String s = null ; while((s = br.readLine())!= null){ parse(s); } }catch(FileNotFoundException e){ e.printStackTrace(); }catch(IOException e){ e.printStackTrace(); } } private static void parse(String s){ Pattern p = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+"); Matcher m = p.matcher(s); while(m.find()){ System.out.println(m.group()); } } }
存入文件:
import java.util.regex.*; import java.io.*; public class Test{ public static void main(String args[]){ try{ BufferedReader br = new BufferedReader(new FileReader("abc.htm")); BufferedWriter bw = new BufferedWriter(new FileWriter("email.txt")); String s = null ; while((s = br.readLine())!= null){ parse(s,bw); } bw.close(); }catch(FileNotFoundException e){ e.printStackTrace(); }catch(IOException e){ e.printStackTrace(); } } private static void parse (String s, BufferedWriter bw) throws IOException{ Pattern p = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+"); Matcher m = p.matcher(s); while(m.find()){ bw.write(m.group()); bw.newLine(); } bw.flush(); } }
练习2:统计代码行数
import java.util.regex.*; import java.io.*; public class CodeCounter{ static long normalLines = 0; static long commentLines = 0; static long whiteLines = 0; public static void main(String args[]){ File f = new File("E:/javacode/20140426"); File[] codeFiles = f.listFiles(); for(File child : codeFiles){ if(child.getName().matches(".*\\.java$")) parse(child); } System.out.println("normalLines: "+normalLines); System.out.println("commentLines: "+commentLines); System.out.println("whiteLines: "+whiteLines); } private static void parse(File f){ BufferedReader br = null ; boolean comment = false; try{ br = new BufferedReader(new FileReader(f)); String line = ""; while((line = br.readLine())!=null){ line = line.trim(); if(line.matches("^[\\s&&[^\\n]]*$")){ whiteLines++; }else if(line.startsWith("/*")&&line.endsWith("*/")){ commentLines++; }else if(line.startsWith("/*")&&!line.endsWith("*/")){ commentLines++; comment=true; }else if(true == comment){ commentLines++; if(line.endsWith("*/")){ comment=false; } }else if(line.startsWith("//")){ commentLines++; }else{ normalLines++; } } }catch(FileNotFoundException e){ e.printStackTrace(); }catch(IOException e){ e.printStackTrace(); }finally{ if(br!=null){ try{ br.close(); br=null; }catch(IOException e){ e.printStackTrace(); } } } } }