Programming Type Systems

If you are a programmer who has worked with multiple programming languages, you must have noticed that while some languages, such as Java and C++, use almost similar methods to define their data types, while others, such as Python and Matlab, use completely different methods. This is because those two groups of languages use different type systems.

A type system is defined as tractable syntactic framework for classifying phrases according to the kinds of values they compute. It associates types with each computed value, and, by examining the flow of these values, attempts to prove that no type errors can occur. A type system generally seeks to guarantee that operations expecting a certain kind of value are not used with values for which that operation makes no sense.

Ok, in simpler words, a type system is a way for programming languages to classify values and expressions into types, how it can manipulate those types, and how they interact. Or, in even simpler terms, how data types are assigned to variables, and how they are handled. There are four type systems that programming languages can adopt. ANY programming language you know or have worked with, except two languages, belongs to at least two of the following categories:

Statistically-typed languages: languages in which data types are fixed at compile time, in other words, type checking (verification) is done when the code is compiled. Languages that use static typing include C++, Java, C#, and F#. These languages enforce this by requiring the programmer to explicitly declare all variables with their data types before use. An example of this is the floating point variables declaration in Java:

float f = 1.0f;

Static typing allows data type errors to be caught earlier in the development cycle. Besides verifying data types, static type checkers verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient by omitting runtime type checks. However, static typing can sometimes reduce code flexibility.

Dynamically-typed languages: languages in which the majority of its type checking is performed at run-time instead of checking at compile-time. Languages that use dynamic typing include JavaScript, PHP, Python, and Tcl. In dynamic typing, the values have types, not the variables; that is, a variable can refer to a value of any type. An example of this in Python:

x = 1
print x
x = "Hello, world!"
print x

This code would produce no errors and print “1” and “Hello, world!” By allowing programs to generate types and functionality based on run-time data, dynamic typing can be more flexible than static typing. However, dynamic typing may result in runtime type errors; at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. Also, this operation could occur long after the place where the wrong type of data passed into a place it should not have, which makes the bug difficult to locate.

One thing to notice is that a dynamically-typed language is not necessarily a dynamic language. The term dynamic language means something different, but more on that later. (In a separate post, maybe? ;))

Strongly-typed languages: languages in which data types are always enforced. A data type cannot be treated like another unless it is explicitly converted. Strongly typed languages, such as C, Java, Pascal, and Python specify severe restrictions on how operations involving values having different data types can be intermixed, preventing the compiling or running of source code which uses data in what is considered to be an invalid way, e.g. the division of an inter over a string. To ensure they achieve their purpose, strongly-typed languages apply some or all of the following constraints:

  • The compiler must ensure that operations occur only on operand types that are valid for the operation.
  • An error must occurs as soon as a type-matching failure happens at runtime, or, as a special case of that with even stronger constraints, type-matching failures must never happen at runtime
  • Omitting implicit type conversions- conversions that are inserted by the compiler on the programmer’s behalf.
  • The type of a given data object does not vary over that object’s lifetime.
  • Type conversions are allowed only when an explicit notation, often called a cast, is used to indicate the desire of converting one type to another.
  • Disallowing any kind of type conversion. Values of one type cannot be converted to another type, explicitly or implicitly.

For example, an attempt to add an integer and a string in Python:

x = 1
y = "Hello, world!"
print x + y

will produce the error:

TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’

Weakly-typed languages: languages in which types may be ignore. Weakly-typed languages support either implicit type conversion, ad-hoc polymorphism (overloading) or both. For example, adding an integer and a string in Matlab:

x = 1;
y = 'Hello, world!';
z = x + y

will not produce any error, actually it will produce the result:

z =    73   102   109   109   112    45    33   120   112   115   109   101    34

One last thing I want to talk about is type safety. Type safety can be defined as the use of a type system to prevent certain erroneous or undesirable program behaviour. This can be achieved statically, by catching potential errors at compile time, or dynamically, by associating type information with values at run time and consulting them as needed to detect imminent errors, or using combination of both. A programming language is called “type-safe” if it does not allow operations or conversions that lead to erroneous conditions, such as the previous Python example.

Remember when I said at the beginning of this post that all programming languages, adopt at least two of the four typing schemes, except two languages? Those two languages are the Assembly Language and Forth. Those two languages have been said to be untyped. There is no type checking. It is up to the programmer to ensure that data given to functions is of the appropriate type. Any type conversion required is explicit.

Share