« DSLs in Java | Home | Interesting tidbit on generics and Class »
Multiple regex matches
I know I have written this code multiple times from scratch so I will blog it here for perpetuity (and some smarter people than me can suggest better solutions).
The problem is that I want to write a regex with a capturing group, then run it on a bunch of text and get back all the captured groups from all the matches. The best way I can figure out how to do this is:
public static List getMatches(Pattern pattern, String text) {
List matches = new ArrayList();
Matcher m = pattern.matcher(text);
int index = 0;
while(m.find(index)) {
matches.add(m.group(1));
index = m.end();
}
return matches;
}
This code is assuming that the pattern has exactly one capture group. It could be extended to handle multiple capture groups pretty easily.
So an example might be that I have some text (say a file of phone numbers) and I want to match each phone number and return all the area codes (just the area codes). So, the text might be:
Albert Pujols 111-456-7890
Darth Vader 222-123-4567
Carrot Top 333-123-4444
And the answer should be 111, 222, and 333. So, you’d do something like this with my method:
Pattern p = Pattern.compile("([0-9]{3})-[0-9]{3}-[0-9]{4}");
String text = "Albert Pujols\t111-456-7890\nDarth Vader\t222-123-4567\nCarrot Top\t333-123-4444\n";
List matches = RegexUtil.getMatches(p, text);
This works great for me. Is there a better way to do this? Is there some magical option on the regex pattern itself to avoid doing the loop?
About this entry
You’re currently reading “Multiple regex matches,” an entry on Pure Danger Tech
- Published:
- Oct 13 2006 / 3:46 pm
- Category:
- programming, Java, regex
- Vote:
- Other posts with these tags:

2 Comments
Jump to comment form | comments rss | trackback uri