Wednesday, September 13, 2006

Data Typing Notes

The term "data type" describes both a structure and a set of operations. Consider the integer data type. Transforming operations such as add, subtract and multiply yield a new integer value. Comparison operations like greater_than, less_than and is_equal_to yield a boolean result.


Analysts rely on two kinds of data types during model development. Core data types and composite data types. Core data types are built in to the modeling language. The analyst can create a composite data type using one or more core data types as building blocks.

What core data types should * UML support? Is the list of core data types bounded by some principle or can you just keep adding them? How are composite data types actually built? How much structure is allowed in a composite data type - is there such a thing as too much? How are composite data types handled by model compilers? How are they handled by an interpreter? These are just a few of the questions that must be answered to construct a useful * UML Data Type metamodel subsystem.

Core Data Types

Since a core data type is built into the modeling language it must be supported by any model compiler. Both the data type structure and all operations should be fully implemented. Keep in mind that a set of models might be translated to Ruby or Java on one day and then directly to Z80 assembler on another. To support all model compilers, we are safer to keep the range of core data types as limited as possible.

On the other hand, model developers like to have a wide variety of data types. In fact, real world applications rarely call for a core data type as seemingly ubiquitous as real. Consider an air traffic application. The compass heading data type is not real. It is in fact [0..360] degrees with a precision of, say, .01. The math operations aren't the same either for compass heading. 180.15 + 270.00 = 90.15. Similarly we need distinct data types for altitude, speed and so forth. True, each of these user data types can be tightly or loosely based on the real core data type, but each is different.

So we have a balancing act. To model application requirements independent of any implementation, we want to support lots of user (application level) data types. To support a wide range of model compilers and target languages, we must limit the set of core data types. And right there we have the answer to our problem.

Any user data type must be derivable from core data type building blocks. The model compilers will be concerned only with the core data types. But the modeler will have everything he or she needs to specify whatever application level data types the project demands. Before moving on to how the user builds composite data types, lets first consider the core set.

Consider these proposed core data types:

  • boolean
  • integer
  • real
  • enumerated
  • string

The first two types, boolean and integer, are supported in every programming language down to assembly level (as far as I know). So they are safe bet. At the level of C and upward, the real (float) data type is supported. Some embedded processors lack floating point capability. But floating point math is quite common in real world applications. In the worst case, a model compiler will have to convert all read core types into integer types during compilation.

Now we move on to enumerated. This is easily implemented using integers so we easily demand this capability of any model compiler.

What about the string data type? C doesn't support strings directly - why should * UML? The data structure is easy enough to accommodate, but there are a number of string operations that must be supported. But strings are extremely common in real world applications. It's hard to imagine avoiding them. That said, certain embedded targets have resource limits that preclude the liberal use of string data types. A model compiler might need to convert string values into a concatenated or compressed integer form to acheive any necessary economy. Since I'm an analyst I have to say "enough for me, let's shift the burden onto the model compiler!" We need strings!

But that's it - we're drawing the line at string. You want to build a model compiler? These are the only types you need to handle. I now assert that the modeler has enough building blocks to construct any required user data type.

Composite Data Types

To build a composite data type we create a Type Specification using one or both of the following layout patterns:

  • Hierarchy
  • Matrix

Hierarchy Layout

Let's say that we want to create the application level Position data type. Hang on - let me bring up Google Earth so I can make sure I get this right... Okay, the Golden Gate Bridge is here:

37 48' 59.41" N
122 28' 54.05" W

We create a Layer called "Latitude" with four data Cells. The cells are ordered sequentially as:

Layer : Latitude
degrees: int[0..90]
minutes: int[0..60]
seconds: real[0..60] prec 2
pole: enum( N | S )

Each cell refers to a core data type. Each core data type has its own qualifying characteristics.

Similarly we create another Layer called "Longitude":

Layer: Longitude
degrees: int[0..180]
minutes: int[0..60]
seconds: real[0..60] prec 2
meridian: enum( W | E )

It is clear we could have created a lower layer to contain minutes and seconds since they are identical, but let's not worry about that now. We create a top layer for Position and we have a complete Type Specification.

Layer: Position
Latitude
Longitude

It seems that composite data types don't require qualifying characteristics. Is this true? Hmmm.

Matrix Layout

We can also arrange core data type cells in a matrix pattern. To specify a 10x10x2 3D matrix of integers between 1 and 100 where any value is at (x, y, z) we could do this:


Matrix: int[1..100]
x: 10
y: 10
z: 2

Ah, but what if we wanted to create a 1x5 matrix of Position values for some reason? Then we combine both the Matrix and Hierarchy layout patterns.


Matrix: Position

p: 5

Composite Data Structure Summary

The Hierarchy and Matrix layout mechanisms make it possible to construct composite data types. But it's not enough to define a data type. We need to somehow define the operations specific to that data type.


But I need to get a sandwich right now. So stay tuned for the next post where we will ponder transformation and comparision operators on composite data types and such things.


- Leon












No comments: