06.06.2012

6+6=12 or not?

I noticed that today we have a funny date: 6.6.-12.

And with some awkward transitions I get to counting.

Sometimes when you calculate stuff in your code, the numbers really don’t add up. Why? Because we have these things called float numbers. Some data types are even called “float numbers”. To understand this we need a bit of some basic math and some real bizarre thinking.

In mathematics we can categorize numbers into different groups. Like even numbers, real numbers, integers, fractions, and complex numbers. In coding we have a bit different groups: integers, floats, long integers, doubles (double precision floating point numbers).

Here are some examples of those:

 Even 2,4,10 Real −5, 4/3, √2 Fraction 2/2, 7/43, -9/2 Complex 1-i, I, 9+7i Integers -5, 0, 18 Float 8.9, 0.0, -111.9475473 Long integer 8 589 934 592 Double 123.1234567890123456

If you have marked your number type to be integer (int), it means you can’t use any decimals. If you try to calculate what is 3 / 2, you won’t get 1.5 as your result. Your computer will round the number into nearest integer (1 or 2) in a way specified in the IEEE-754 standard. You haven’t given the computer a possibility use decimal point, so it just can’t use them. That’s why be careful when using int-type! (note: this might not seem as a problem in a languages like Python where you don’t have to specify any data type. In reality there still is this data type but the language actually guesses them itself. Unfortunately sometimes the guess goes wrong and it might be pain to see where things went wrong).

That probably was still quite easy to understand, so let’s gain some levels! With floats you can use decimals. But not totally freely. Float is also a data type and computer reserves a float certain amount of memory it can use. So, the float can have only about 7 decimals. AND only certain amount of exponent part (the numbers on the left of the decimal point). If you go over these borders, well, the computer just doesn’t take any extra numbers into account (or it goes stuck as it doesn’t know what to do). That’s why your nice calculation can have some mistakes resulted from rounding errors or just because of the cutting. That’s why we have long integers (you can have bigger integer numbers) and doubles (they give you extra space with both sides of the decimal point). These rounding errors are simple numerical errors and are the reason why almost anything calculated with numbers is just approximation, it is not totally accurate.

But what is going on when any of the features described above don’t affect to the miscalculation of your code?

Computers aren’t, well, humans. We might see a number 7 as exactly the number 7 with value exactly 7. Computer might not. If the data type is anything else than integer, your 7 might actually be 6,999999.

And some floating point numbers (those with decimals) can’t even be presented with computer because the numbers are really represented with binary. Binary is different radix than our normal decimal system. Just like human languages have idioms. Some idioms can be same or similar between two languages but most of the time idioms only work with the base language. If you have studied Chinese or know it otherwise, you know that there really aren’t tenses. You can say “we are going to swim tomorrow” but “we are going to swim” is harder to translate totally accurately as the language doesn’t have same features (tenses in this case) as English has. So, those two languages aren’t totally compatible. Just like binary and our decimal system aren’t either.

One even harder thing to understand is numbers that go around. Kind of. Numbers are bit sequences, like number 7 is to a computer “00000111” if the number is described in 8-bit way. If a number grows too big, it’s first bit turns from 0 to 1. In computers eyes the number then changes from positive to negative (or other way round) because the first bit indicates whether the number is positive or not but the computer tries to give you your big number without thinking that the first bit was reserved to this +/- mark.

Notice that depending on the processor, the computer can actually read numbers from left-to-right or right-to-left. Mostly you don’t have to mind about this but when coding with microcontrollers and close contact with processors it is a valuable thing to know.

Trivia: Kahan summation is an algorithm to minimize the numerical errors when adding numbers together. Look it up, if interested :)

PS. With computers there is no such a thing as commutativity. Meaning: a+b may not be same thing as b+a. This is again because of the way computers represent numbers in reality.