Monday, June 16, 2008

"long double" doesn't seem to work on PPC

Part of my numerical experiments require that I add up lots of different numbers. These numbers can take on a huge range of values. Some of the numbers can be millions of times larger than others. The software I've written uses what is known as a floating point number to represent these values. The term "floating point" comes from the fact that the decimal point is not fixed at a certain place-value in these numbers. The decimal point could come after the ones' place, the tens' place, the thousands' place, or even the one-ten-thousandths' place. This allows us to represent specific values by multiplying as in the following examples

1234 equals 1.234 x 1000 (the decimal comes after the thousands' place)

.05678 equals 5.678 x 1/100 (the decimal comes after the one-hundredths' place)

The crux is that the computer affords only a fixed number of digits after the decimal place. This introduces what is known as roundoff error. If, for example, I only have four digits after the decimal place, then the following addition,

1234 + .05678 = 1234.05678,

will be problematic because the only representation I'm allowed for the answer is 

1.2341 x 1000,

which is not the same as what is computed above. Fortunately computers provide more than four digits after the decimal place. Unfortunately it isn't a lot more, and even more unfortunately it isn't the same from one type of computer to the next. 

Last semester an undergraduate and I came up with a relatively fast algorithm for adding up lots of numbers while minimizing roundoff error. Even so, we still need to measure the error. Instead of looking up the, quite possibly incorrect, answers posted on the web. I found a cute little algorithm for determining the roundoff error more directly. You perform the following sequence of computations

1 + 1/2
1 + 1/4
1 + 1/8
1+ 1/16
.
.
.

Now, none of these calculations should ever give an answer of 1. However, because of the roundoff error, eventually you are adding something to 1 that is so small that it is below the roundoff error for the number 1, and you actually get an answer of 1. This allows you to place an estimate on the roundoff error. The result depends on the type of floating point number you use.

The types of floating point numbers are:
  • float: About seven digits after the decimal point
  • double: Roughly doubles the precision, and gives about fifteen digits after the decimal place
  • long double: Step right up! Take your chances!
I wrote a program to implement the above test for the "long double" and found that it gave garbage answers. I got home and moved the program from my iBook (a computer based on the PowerPC chip) to my Mini (a computer based on an Intel chip) and the problems vanished. I then ran the software on the AMD nodes in the cluster and everything worked. Then I ran the software on the PowerPC blades in the cluster and saw the exact same problems I saw on my iBook. 

So, Yah! I've recognized the issue, but, Boo! I hate hardware! It's a sure sign of society's unraveling that a poor schmuck such as myself should have to wander through a maze of different computer architectures in an effort to do something as basic as adding up a bunch of numbers accurately.