Splitting Strings in Java Using Regular Expressions Instead of StringTokenizer

In this quick tutorial I am going to illustrate how you can split String values in Java using regular expressions instead of the StringTokenizer class. Assume you have a string of brand names (that used to be only fruits!) separated by commas:

String brands = "Orange,Apple,Blackberry";

and you want to split that string so that each brand is an item in an array of a string data type. Doing this using the StringTokenizer class would probably look something like this:

String brands = "Orange,Apple,Blackberry";
StringTokenizer tokenizer = new StringTokenizer(brands, ",");
String[] res = new String[tokenizer.countTokens()];

int i = 0;
while (tokenizer.hasMoreTokens()) {
     res[i] = tokenizer.nextToken();

Despite being effective in this case, StringTokenizer is not always flexible and easy to use. For example, StringTokenizer can only handle one-character delimiters or group multiple delimiters as one; you cannot tell it to look for a particular word as a delimiter. Also, StringTokenizer cannot easily handle the possibility that two consecutive delimiters indicate a zero-length (empty) token. For example, assume we changed the input string to:

String brands = "Orange,Apple,,Blackberry";

If the “,,” is used to indicate a blank field, StringTokenizer becomes very difficult to use. For these reasons, and a lot others, StringTokenizer is considered obsolete and is not recommended to use. Instead, you should use regular expressions. Regular expressions, also called regex, are special text strings describing search patterns, the most frequently used regular expressions are:

Source: Introduction to Java Programming

There is a variety of ways to split strings using regular expressions. However, the most straightforward way is using the split() method located in the String class. The previous example can be re-written to this:

String brands = "Orange,Apple,Blackberry";
String[] res = brands.trim().split("[^a-z]");

The same result is reached with fewer lines of code. Not only that, the use of regular expressions permits the use of whole words as delimiters without any problems, as well as indicating zero-length tokens, if the input string is changed to:

String brands = "Orange,Apple,,Blackberry";

The third item in the array- index 2- will be a blank space, and “Blackberry” will be located at index 3.

Regular expressions, especially when combined with the split() method, Pattern, and Scanner classes, provide a very powerful and flexible alternative to StringTokenizer. The downside is that you have to learn and memorise regular expressions in order to use split()/Pattern/Scanner efficiently. However, regular expressions are very useful to learn, and they are used in many languages besides Java. For more information, you can check the Java tutorial on regular expressions as well as this website.

Hope this was useful, your comments and questions are always appreciated.


Programming Type Systems

If you are a programmer who has worked with multiple programming languages, you must have noticed that while some languages, such as Java and C++, use almost similar methods to define their data types, while others, such as Python and Matlab, use completely different methods. This is because those two groups of languages use different type systems.

A type system is defined as tractable syntactic framework for classifying phrases according to the kinds of values they compute. It associates types with each computed value, and, by examining the flow of these values, attempts to prove that no type errors can occur. A type system generally seeks to guarantee that operations expecting a certain kind of value are not used with values for which that operation makes no sense.

Ok, in simpler words, a type system is a way for programming languages to classify values and expressions into types, how it can manipulate those types, and how they interact. Or, in even simpler terms, how data types are assigned to variables, and how they are handled. There are four type systems that programming languages can adopt. ANY programming language you know or have worked with, except two languages, belongs to at least two of the following categories:

Statistically-typed languages: languages in which data types are fixed at compile time, in other words, type checking (verification) is done when the code is compiled. Languages that use static typing include C++, Java, C#, and F#. These languages enforce this by requiring the programmer to explicitly declare all variables with their data types before use. An example of this is the floating point variables declaration in Java:

float f = 1.0f;

Static typing allows data type errors to be caught earlier in the development cycle. Besides verifying data types, static type checkers verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient by omitting runtime type checks. However, static typing can sometimes reduce code flexibility.

Dynamically-typed languages: languages in which the majority of its type checking is performed at run-time instead of checking at compile-time. Languages that use dynamic typing include JavaScript, PHP, Python, and Tcl. In dynamic typing, the values have types, not the variables; that is, a variable can refer to a value of any type. An example of this in Python:

x = 1
print x
x = "Hello, world!"
print x

This code would produce no errors and print “1” and “Hello, world!” By allowing programs to generate types and functionality based on run-time data, dynamic typing can be more flexible than static typing. However, dynamic typing may result in runtime type errors; at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. Also, this operation could occur long after the place where the wrong type of data passed into a place it should not have, which makes the bug difficult to locate.

One thing to notice is that a dynamically-typed language is not necessarily a dynamic language. The term dynamic language means something different, but more on that later. (In a separate post, maybe? ;))

Strongly-typed languages: languages in which data types are always enforced. A data type cannot be treated like another unless it is explicitly converted. Strongly typed languages, such as C, Java, Pascal, and Python specify severe restrictions on how operations involving values having different data types can be intermixed, preventing the compiling or running of source code which uses data in what is considered to be an invalid way, e.g. the division of an inter over a string. To ensure they achieve their purpose, strongly-typed languages apply some or all of the following constraints:

  • The compiler must ensure that operations occur only on operand types that are valid for the operation.
  • An error must occurs as soon as a type-matching failure happens at runtime, or, as a special case of that with even stronger constraints, type-matching failures must never happen at runtime
  • Omitting implicit type conversions- conversions that are inserted by the compiler on the programmer’s behalf.
  • The type of a given data object does not vary over that object’s lifetime.
  • Type conversions are allowed only when an explicit notation, often called a cast, is used to indicate the desire of converting one type to another.
  • Disallowing any kind of type conversion. Values of one type cannot be converted to another type, explicitly or implicitly.

For example, an attempt to add an integer and a string in Python:

x = 1
y = "Hello, world!"
print x + y

will produce the error:

TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’

Weakly-typed languages: languages in which types may be ignore. Weakly-typed languages support either implicit type conversion, ad-hoc polymorphism (overloading) or both. For example, adding an integer and a string in Matlab:

x = 1;
y = 'Hello, world!';
z = x + y

will not produce any error, actually it will produce the result:

z =    73   102   109   109   112    45    33   120   112   115   109   101    34

One last thing I want to talk about is type safety. Type safety can be defined as the use of a type system to prevent certain erroneous or undesirable program behaviour. This can be achieved statically, by catching potential errors at compile time, or dynamically, by associating type information with values at run time and consulting them as needed to detect imminent errors, or using combination of both. A programming language is called “type-safe” if it does not allow operations or conversions that lead to erroneous conditions, such as the previous Python example.

Remember when I said at the beginning of this post that all programming languages, adopt at least two of the four typing schemes, except two languages? Those two languages are the Assembly Language and Forth. Those two languages have been said to be untyped. There is no type checking. It is up to the programmer to ensure that data given to functions is of the appropriate type. Any type conversion required is explicit.


NetBeans Refactoring – Part 2

Greetings my fellow programmers! Now that NetBeans 6.8 is released, I decided that this would be the best time to post part two of my NetBeans refactoring tutorial. I highly recommend reading part one before reading this post. Read it here.

The following Java class will be used to illustrate the refactoring techniques in this post:

public class FruitStand {
 int quantity;
 double pricePerUnit;
 double totalPrice = (pricePerUnit + 12.5) * quantity ;
 String[] merchandise = new String[100];

 public void insertIntoDB(String[] names) {
  for (int i = 0; i < names.length; i++)
   names[i] = names[i].toUpperCase();

  // Rest of method...

 public class Apples {
  int quantity = 500;
  double pricePerUnit = 5.99;

Some of the images here are not very clear because of page size limitations. You can click on any image to see it in full size.

Extract Interface:

Extract Interface refactoring allows you to select public non-static methods and move them into a separate interface. This can make your code more readable and maintainable. Suppose you want to extract the insertIntoDB method into an interface:

Note that the class FruitStand now implements NewInterface, and that NewInterface has appeared in the package:

Extract Superclass:

Extract Superclass refactoring works exactly the same way as Extract Interface refactoring. However Extract Superclass moves the methods to a new superclass and extends the refactored class (the one from which the methods were pulled). Repeating the same process we did in Extract Interface:

Move Inner to Outer Level:

Move Inner to Outer Level refactoring converts an inner class to a separate external class declared in its own file. It also gives you the option to declare a field for the current outer class. Here, I am going move the inner class Apples to a separate external class:

You can specify a new name for the class that is being moved and optionally, you can select to declare a field for the current outer class and enter a name for that field:

Note that the inner class Apples has disappeared from class FruitStand, and that a new class Apples has appeared in the package:

Introduce Constant:

Introduce Constant refactoring allows you to change a value used in your code into an individual constant. Assume that the owner of the fruit stand decided to deduct 12.5£ from each unit she sells (her business is not paying well!). So, the value of 12.5 will be declared as an individual constant by first highlighting it then choosing Introduce Constant:

You can also choose the name and the access modifier for the newly created constant:

And voila! The constant is declared and used in the statement where 12.5 was used:

Introduce Method:

Sometimes in your code, you notice that there are certain sections that contain similar blocks of code. Introduce Method refactoring can extracts these code fragments into a separate method that can be called anywhere. This makes the code more readable and easier to maintain. Assume that the fruits in the inventory are submitted into a database, but before doing so, their names must be written in upper case letters (it is stupid, I know!). The for loop inside the insertIntoDB method does that. You find out later that you need to do this operation again somewhere else in the code. Instead of writing it again, you can simply extract it into a separate method. To do this, highlight the code you want to convert to a method and choose Introduce Method:

As in Introduce Constant, you get to choose the method name and its access modifier:

And there you go! The method is declared and called where the piece of code was previously written:

Note that your code selections must meet the following criteria:

  • Selections cannot have more than one output parameter.
  • Selections cannot contain a break or continue statements if the corresponding target is not part of the selection.
  • Selections cannot contain a return statement that is not the last statement of the selection. The selected code is not allowed to return conditionally.

Encapsulate Fields:

When developing object-oriented code, it is almost a standard to give the fields (variables) private access, and use public methods to change their values or retrieve them- mutator (setter) and accessor (getter), respectively. Encapsulate Fields refactoring allows the automation of this process; it generates getter and setter methods for the desired fields, enabling the changing of the access modifier for those fields and the accessor methods. To illustrate this, I will encapsulate the quantity and pricePerUnit fields:

You can choose which fields to encapsulate and change the visibility modifiers of the fields and accessors:

After clicking Refactor, you will note that accessors and mutators of the chosen fields were added to your code:

And that is it. Hope you enjoyed this two-part tutorial. If you have any question or comment, do not hesitate to contact me or leave a comment here. See you later with another Significant Insignificane tutorial. Cheers!


Things That Not All Programmers Know #1: Cyclic Inheritance

It has been a month since launching Significant Insignificance last October. Before anything I would like to thank everyone who cared enough to visit my blog and read what may seem to others nothing but insignificant thoughts. I hope you enjoyed reading it as much as I enjoyed writing it and promise more of the same.

Today, I decided to add a new series to my blog: “Things That Not All Programmers Know”. To make sure that I always write technical posts, on the first day of every month, I am going to post something (a trick, a tip, a best practice) related to programming that I believe not many young programmers know. This way, everyone- including me- gets to learn something new.

I am going to start this series with something relatively easy: Cyclic Inheritance. First, the definition:

If a class or interface inherits from itself, even directly (i.e., it appears as its super-class or in its list of super-interfaces) or indirectly (i.e., one of its super-class or one of its super-interfaces inherits from it), a cyclic inheritance error is reported.

Ok, this needs an example, consider the following code:

When trying to compile it, you get the error: “cyclic inheritance involving Person“.

This is cyclic inheritance; each class inherits the other: Employee is the subclass of Person, and Person is the subclass of Employee. This relationship is not possible.

Cyclic inheritance can be solved by determining the proper relationship; that is, finding out the right super class and the subclass. In the previous example, Person is the super class, so it should not try to inherit Employee and Employee is the subclass which will inherit Person. Correction is removing “extends Employee” from the Person class signature:

Hope this was useful. If you know any unique programming trick (Java or others) that you believe not many people know, do not hesitate to contact me. And again, for everyone who followed my blog throughout the previous month, thanks a lot for your dedication. This blog would not be alive if it weren’t for your support. Thank you.

NetBeans Refactoring – Part 1

This week I decided to give you, my dear readers, a little rest from my blabbering and go technical a bit. I am going to talk about a very powerful and a personal favorite NetBeans feature: Refactoring.

If you do not know NetBeans, it is a free, cross-platform, open-source Integrated Development Environment (IDE) for software developers and a product of Sun Microsystems. An IDE is a software application that provides extensive facilities to programmers for software development, such as a source code editor, a compiler, an interpreter, build automation tools, and a debugger. For more information about NetBeans’ features, visit the official website.

Refactoring is about changing your source code easily. Imagine moving a class between packages and having to edit the package statements manually at the top of each file, or wanting delete a variable in the code and not knowing if it is referenced somewhere else in your application. Performing these types of operations manually can be time consuming and prone to error. However, with the advanced refactoring capability available in NetBeans, you can do such changes very easily. NetBeans provides many refactoring options on its Refactor menu. I am going illustrate half of them today and the other half in a following post.

EDIT: Part 2 is now available. Click here to read it.

For this post I have created two classes, ImportingClass and MoveClass. View them here and here.

There are some images that are not very clear because of page size limitations. You can click on any image to see it in full size.


Rename refactoring allows you to change not only the name of the class but also any constructors, internal usages, and references to the renamed class by other classes. You can also rename the package, which will automatically rename all instances of the package name in your code, including comments. Here, I am going to change the name of the variable value to AddedValue in class ImportingClass:

Automatically, the variable name is also changed in class MoveClass:


Moving a class from one package to another may seem like an easy task; you just have to copy and paste the contents of the source file into the new directory and then edit the package statement at the top of the file then you are good to go. However, if other classes import or reference that class, then the developer must also search through and modify those files. Here is how you can move MoveClass from the refactoringpackage2 package to refactoringpackage1:


Copy refactoring allows you to copy the contents of a class to another package, automatically changing the package statement at the top of the source file.

Safe Delete:

Sometimes when you are reviewing a previously written code, you decide to remove a class member variable that you think is not used, only to find out that it does indeed appear in your code, and then your class does not compile. With Safe Delete refactoring, you can identify each usage of a class, method, or field in code before deleting it. I am going to illustrate it by trying to delete the AddedValue variable from class ImportingClass:

However, the variable was referenced in class MoveClass, so deleting it would cause an error. T=NetBeans alerts you to that, and can even show you where the member you want to delete is referenced if you clicked the “Show Usages…” button:

Change Method Parameters:

Change Method Parameters allows you to safely change everything in a method header- access modifier and arguments- and notifies you if this change is going to affect your source file. Here, I am going to try to delete the parameter x of the display method of class ImportingClass:

Parameter x is used within the method body, and so a warning is displayed:

For the following examples, I created a class called Truck, which inherits fr0m a class called Vehicle. View here.

Pull Up:

When working with classes and superclasses, Pull Up refactoring is very useful. It allows you to move class members and methods from a subclass up into the superclass.

Push Down:

Push Down is the opposite of Pull Up refactoring. It pushes a superclass member down into a subclass.

And that’s it for this time, note that those are only half of the capabilities of NetBeans refactoring. I am going to go through the other half in a following post soon. If you have any question or comment, do not hesitate to contact me.

EDIT: Part 2 is now available. Click here to read it.