一个正则表达式是一个用于文本搜索的文本模式。正则表达式就是记录文本规则的代码。
Java 提供了功能强大的正则表达式API,在java.util.regex 包下。
正则表达式API:
------------ Pattern ------------
- Pattern.matches()
- Pattern.compile()
- pattern.matcher()
- Pattern.split()
- Pattern.pattern()
------------ Matcher ------------
- Matcher.matches()
- Matcher.lookingAt()
- Matcher.find()
- Matcher.start()
- Matcher.end()
- Matcher.reset() reset()方法会重置Matcher 内部的 匹配状态。当find() 方法开始匹配时,Matcher 内部会记录截至当前查找的距离。调用 reset() 会重新从文本开头查找。
- Matcher.reset(CharSequence) 这个方法重置Matcher,同时把一个新的字符串作为参数传入,用于代替创建 Matcher 的原始字符串。
- Matcher.group()
- Matcher.replaceAll()
- Matcher.replaceFirst()
- Matcher.appendReplacement()
- Matcher.appendTail()
Pattern相关API:
类java.util.regex.Pattern简称Pattern,是java正则表达式API的主要入口,无论何时,需要使用正则表达式,从Pattern开始。
1.Pattern.matches()
检查一个正则表达式的模式是否匹配一段文本的最直接方法是调用静态方法Pattern.matches(),
Pattern.matches() 方法适用于检查 一个模式在一个文本中出现一次的情况,对整个文本匹配正则表达式。
private static void test1() { String text = "This is the text to be searched for occurrences of the pattern."; String pattern = ".*is.*"; boolean matches = Pattern.matches(pattern, text); System.out.println("matches: " + matches); } 输出: matches: true
2.Pattern.compile()、Pattern.matcher()
如果需要匹配一个正则表达式在文本中多次出现,需要通过Pattern.compile() 方法创建一个Pattern对象。
private static void test2() { String text = "This is the text to be searched for occurrences of the http:// pattern."; String patternString = ".*http://.*"; //获得 Pattern对象 Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); boolean matches = matcher.matches(); System.out.println("matches: " + matches); } 输出: matches: true
3.Pattern.split()
Pattern 类的 split()方法,可以用正则表达式作为分隔符,把文本分割为String类型的数组
private static void test3() { String text = "A sep Text sep With sep Many sep Separators"; String patternString = "sep"; Pattern pattern = Pattern.compile(patternString); String[] strs = pattern.split(text); System.out.println("strs.length: " + strs.length); for (String str : strs) { System.out.println("str: " + str); } } 输出: strs.length: 5 str: A str: Text str: With str: Many str: Separators
4.Pattern.pattern()
Pattern 类的 pattern 返回用于创建Pattern 对象的正则表达式
private static void test4() { String text = "A sep Text sep With sep Many sep Separators"; String patternString = "sep"; Pattern pattern = Pattern.compile(patternString); String patternString2 =pattern.pattern(); System.out.println("patternString2: " + patternString2); } 输出: patternString2: sep
Matcher相关API:
Matcher (java.util.regex.Matcher),java.util.regex.Matcher 类用于匹配一段文本中多次出现一个正则表达式,
Matcher 也适用于多文本中匹配同一个正则表达式。
5.Matcher.matches()
Matcher 类的 matches() 方法用于在文本中匹配正则表达式
matches() 方法不能用于查找正则表达式多次出现。如果需要,请使用find(), start() 和 end() 方法private static void test5() { String text = "This is the text to be searched for occurrences of the http:// pattern."; String patternString = ".*http://.*"; //获得 Pattern对象 Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); //获得Matcher对象 用于匹配文本中的模式 //通过Pattern 的matcher() 方法创建一个Matcher。 //Matcher 类的 matches() 方法用于在文本中匹配正则表达式 //matches() 方法不能用于查找正则表达式多次出现。如果需要,请使用find(), start() 和 end() 方法。 Matcher matcher = pattern.matcher(text); boolean matches = matcher.matches(); System.out.println("matches: " + matches); } 输出: matches: true
6.Matcher.lookingAt()
lookingAt() 与matches() 方法类似,最大的不同是,lookingAt()方法对文本的开头匹配正则表达式;而matches() 对整个文本匹配正则表达式。
private static void test6() { String text = "This is the text to be searched for occurrences of the http:// pattern."; String patternString = "This is the"; Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); System.out.println("lookingAt = " + matcher.lookingAt()); System.out.println("matches = " + matcher.matches()); } 输出: lookingAt = true matches = false
7.Matcher.find()、Matcher.start()、Matcher.end()
find() 方法用于在文本中查找出现的正则表达式
如果在文本中多次匹配,find() 方法返回第一个,之后每次调用 find() 都会返回下一个。 start() 和 end() 返回每次匹配的字串在整个文本中的开始和结束位置。实际上, end() 返回的是字符串末尾的后一位,这样,可以在把 start() 和 end() 的返回值直接用在String.substring() 里。private static void test7() { String text = "This is the text which is to be searched for occurrences of the word 'is'."; String patternString = "is"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(text); int count = 0; while(matcher.find()){ count++; System.out.println("found: " + count + " start: " + matcher.start() + " end: " + matcher.end()); } } 输出: found: 1 start: 2 end: 4 found: 2 start: 5 end: 7 found: 3 start: 23 end: 25 found: 4 start: 70 end: 72
8.Matcher.group()
分组在正则表达式中用括号表示,例如:(John)此正则表达式匹配John, 括号不属于要匹配的文本。括号定义了一个分组。当正则表达式匹配到文本后,可以访问分组内的部分。
使用group(int groupNo) 方法访问一个分组。一个正则表达式可以有多个分组。每个分组由一对括号标记。想要访问正则表达式中某分组匹配的文本,可以把分组编号传入 group(int groupNo)方法。 group(0) 表示整个正则表达式,要获得一个有括号标记的分组,分组编号应该从1开始计算。 当遇到嵌套分组时, 分组编号是由左括号的顺序确定的。private static void test8() { String text = "John writes about this, and John writes about that, and John writes about everything. "; String patternString1 = "((John) (.+?)) "; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); while(matcher.find()) { System.out.println("found: " + matcher.group(1) + " " + matcher.group(2) + " " + matcher.group(3)); System.out.println("matcher.group(): " + matcher.group()); System.out.println("matcher.group(0): " + matcher.group(0)); System.out.println("matcher.group(2): " + matcher.group(2)); System.out.println("matcher.group(3): " + matcher.group(3)); } } 输出: found: John writes John writes matcher.group(): John writes matcher.group(0): John writes matcher.group(2): John matcher.group(3): writes found: John writes John writes matcher.group(): John writes matcher.group(0): John writes matcher.group(2): John matcher.group(3): writes found: John writes John writes matcher.group(): John writes matcher.group(0): John writes matcher.group(2): John matcher.group(3): writes
9.Matcher.replaceAll()、Matcher.replaceFirst()
replaceAll() 方法替换全部匹配的正则表达式
replaceFirst() 只替换第一个匹配的。
private static void test9() { String text = "John writes about this, and John Doe writes about that, and John Wayne writes about everything."; String patternString1 = "((John) (.+?)) "; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); String replaceAll = matcher.replaceAll("Joe Blocks "); System.out.println("replaceAll: " + replaceAll); String replaceFirst = matcher.replaceFirst("Joe Blocks "); System.out.println("replaceFirst: " + replaceFirst); } 输出: replaceAll: Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks writes about everything. replaceFirst: Joe Blocks about this, and John Doe writes about that, and John Wayne writes about everything.
10.Matcher.appendReplacement()、Matcher.appendTail()
appendReplacement() 和 appendTail() 方法用于替换输入文本中的字符串短语,同时把替换后的字符串附加到一个 StringBuffer 中。
当find() 方法找到一个匹配项时,可以调用 appendReplacement() 方法,这会导致输入字符串被增加到StringBuffer 中,而且匹配文本被替换。 从上一个匹配文本结尾处开始,直到本次匹配文本会被拷贝。 appendReplacement() 会记录拷贝StringBuffer 中的内容,可以持续调用find(),直到没有匹配项。 直到最后一个匹配项目,输入文本中剩余一部分没有拷贝到 StringBuffer. 这部分文本是从最后一个匹配项结尾,到文本末尾部分。通过调用 appendTail() 方法,可以把这部分内容拷贝到 StringBuffer 中。private static void test10() { String text = "John writes about this, and John Doe writes about that, and John Wayne writes about everything."; String patternString1 = "((John) (.+?)) "; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); StringBuffer stringBuffer = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(stringBuffer, "Joe Blocks "); System.out.println("stringBuffer: " + stringBuffer); } matcher.appendTail(stringBuffer); System.out.println("------appendTail------"); System.out.println("stringBuffer: " + stringBuffer); } 输出: stringBuffer: Joe Blocks stringBuffer: Joe Blocks about this, and Joe Blocks stringBuffer: Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks ------appendTail------ stringBuffer: Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks writes about everything.
参考链接:
学习链接: