Splitting Strings in Java Using Regular Expressions Instead of StringTokenizer

In this quick tutorial I am going to illustrate how you can split String values in Java using regular expressions instead of the StringTokenizer class. Assume you have a string of brand names (that used to be only fruits!) separated by commas:

String brands = "Orange,Apple,Blackberry";

and you want to split that string so that each brand is an item in an array of a string data type. Doing this using the StringTokenizer class would probably look something like this:

String brands = "Orange,Apple,Blackberry";
StringTokenizer tokenizer = new StringTokenizer(brands, ",");
String[] res = new String[tokenizer.countTokens()];

int i = 0;
while (tokenizer.hasMoreTokens()) {
     res[i] = tokenizer.nextToken();
     i++;
}

Despite being effective in this case, StringTokenizer is not always flexible and easy to use. For example, StringTokenizer can only handle one-character delimiters or group multiple delimiters as one; you cannot tell it to look for a particular word as a delimiter. Also, StringTokenizer cannot easily handle the possibility that two consecutive delimiters indicate a zero-length (empty) token. For example, assume we changed the input string to:

String brands = "Orange,Apple,,Blackberry";

If the “,,” is used to indicate a blank field, StringTokenizer becomes very difficult to use. For these reasons, and a lot others, StringTokenizer is considered obsolete and is not recommended to use. Instead, you should use regular expressions. Regular expressions, also called regex, are special text strings describing search patterns, the most frequently used regular expressions are:

Source: Introduction to Java Programming

There is a variety of ways to split strings using regular expressions. However, the most straightforward way is using the split() method located in the String class. The previous example can be re-written to this:

String brands = "Orange,Apple,Blackberry";
String[] res = brands.trim().split("[^a-z]");

The same result is reached with fewer lines of code. Not only that, the use of regular expressions permits the use of whole words as delimiters without any problems, as well as indicating zero-length tokens, if the input string is changed to:

String brands = "Orange,Apple,,Blackberry";

The third item in the array- index 2- will be a blank space, and “Blackberry” will be located at index 3.

Regular expressions, especially when combined with the split() method, Pattern, and Scanner classes, provide a very powerful and flexible alternative to StringTokenizer. The downside is that you have to learn and memorise regular expressions in order to use split()/Pattern/Scanner efficiently. However, regular expressions are very useful to learn, and they are used in many languages besides Java. For more information, you can check the Java tutorial on regular expressions as well as this website.

Hope this was useful, your comments and questions are always appreciated.

Share