This file: http://ftp.aset.psu.edu/pub/ger/fortran/hdk/KahnSum.txt From: "James Giles" Newsgroups: comp.lang.fortran Subject: Re: Kahan's trick (was: Algorithm suggestion?) Date: Wed, 03 Apr 2002 01:55:34 GMT I notice that in most of the discussion of Kahan's summation algorithm people have only been experimenting and not looking at the problem analytically. Further, even in their experiments, they choose problems that wouldn't cause any rounding error even if only the naive summation algorithm were used with purely single precision calculations. If numbers to be summed are of about the same order of magnitude and require only a small number of bits to represent exactly, you can sum quite a few of them with no trickery at all. For example, if you sum randomly chosen integers between 0 (zero) and 255 into a single precision accumlator variable, you can add at least 2**16 such values before the rounding rule of your processor is even applied! This is because the numbers themselves differ by at most one part in 2**8, and single precision carries 24 bits. Real sums tend to involve numbers that are not single-bit quantities and/or are not all of the same magnitude or sign. Consider the problem of adding a sequence of numbers consisting of 2**23, 2**18, -2**8, and 1. In some order you want to add 1024 of each of these. Here there will almost certainly be a difference between the naive algorithm using single precision and Kahan's algorithm. Note: caution is needed in some environments that usually promote intermediate results to extended precision. You will see no advantage if you just compare Kahan's algorithm to something that's *actually* using 64-bit significands. You would have to add at least 2**40 items chosen from the above set before any rounding takes place! So, the following program is used to test this: Program test implicit none integer, parameter :: size=2**10 integer :: i real :: ss = 0.0, ks = 0.0 real(kind(0.0d0)) :: ds = 0.0 real :: t, y, c = 0.0 !kahan temporaries real, parameter :: x = 2**23 + 2**18 - 2**8 + 1 ! real, parameter :: x = 1 do i = 1, size ! kahan sum y = x - c t = ks + y c = (t-ks)-y ks = t ! single sum ss = ss + x ! double sum ds = ds + x end do Write(*,"(' double precision sum is: ', g16.9)") ds Write(*,"(' single precision sum is: ', g16.9, ' in error by', f9.4, & & '%')")ss, 100*abs(ds-ss)/ds Write(*,"(' Kahan sum is: ', g16.9, ' in error by', f9.4, & & '%')")ks, 100*abs(ds-ks)/ds stop End Program In CVF 6.6, you must compile with /fltconsistency or the single precision operations will use double-extended temporaries. When this is run, the result is: double precision sum is: 0.885810893E+10 single precision sum is: 0.885824307E+10 in error by 0.0015% Kahan sum is: 0.885810893E+10 in error by 0.0000% Notice that the single precision sum is off in the fifth decimal. It's actually 131 units off in the last place. That's kind of a long way to drift in only 1024 steps. Especially since the correct final answer is exactly representable in single precision (it's 1024 time X). If you change the value of SIZE to 2**20 instead, the result is: double precision sum is: 0.907070354E+13 single precision sum is: 0.892936703E+13 in error by 1.6% Kahan sum is: 0.907070354E+13 in error by 0.0% So, in just over a million elements, the single precision sum is off in the second digit. The result from Kahan's algorithm is still exact (though some the partial sums on the way were not even exactly representable and Kahan's algorithm adapted to that). Just for fun, comment out the definition of X and select the value of 1 (one) instead. Then set SIZE to 2**26. The result is: double precision sum is: 67108864.0 single precision sum is: 16777216.0 in error by 75.0% Kahan sum is: 67108864.0 in error by 0.0% The reason for this is that 1(one) is so small compared to 16777216 that you can add it as many times as you want and it doesn't change the answer. The Kahan algorithm keeps accumulating them though until they amount to enough for some effect. There are conditions under which Kahan's algorithm fails (but it's always at least as good as just plain single precision). Double precision is always at least as good as Kahan's algorithm, and often it's much better. So, if double is always so good, why ever use Kahan's algorithm? Well, there are cases where double is more expensive. The new vector instructions on many processors use single. If you want to use them, you won't have double. Is the vector algorithm with Kahan's modification faster than a scalar version with double? I haven't compared. It would be an interesting thing to try (provided the compilers become available that can correctly unroll and vectorize Kahan's algorithm). And, there are cases when even double precision is not adequate, but quad is *really* more expensive. Here, Kahan's method applied to double may be just what you need. -- J. Giles