« DSLs in Java | Home | Interesting tidbit on generics and Class »

Multiple regex matches

I know I have written this code multiple times from scratch so I will blog it here for perpetuity (and some smarter people than me can suggest better solutions).

The problem is that I want to write a regex with a capturing group, then run it on a bunch of text and get back all the captured groups from all the matches. The best way I can figure out how to do this is:


public static List getMatches(Pattern pattern, String text) {
List matches = new ArrayList();
Matcher m = pattern.matcher(text);
int index = 0;
while(m.find(index)) {
matches.add(m.group(1));
index = m.end();
}
return matches;
}

This code is assuming that the pattern has exactly one capture group. It could be extended to handle multiple capture groups pretty easily.

So an example might be that I have some text (say a file of phone numbers) and I want to match each phone number and return all the area codes (just the area codes). So, the text might be:


Albert Pujols 111-456-7890
Darth Vader 222-123-4567
Carrot Top 333-123-4444

And the answer should be 111, 222, and 333. So, you’d do something like this with my method:


Pattern p = Pattern.compile("([0-9]{3})-[0-9]{3}-[0-9]{4}");
String text = "Albert Pujols\t111-456-7890\nDarth Vader\t222-123-4567\nCarrot Top\t333-123-4444\n";
List matches = RegexUtil.getMatches(p, text);

This works great for me. Is there a better way to do this? Is there some magical option on the regex pattern itself to avoid doing the loop?


About this entry