This file: http://ftp.aset.psu.edu/pub/ger/fortran/hdk/KahnSum.txt
From: "James Giles"
Newsgroups: comp.lang.fortran
Subject: Re: Kahan's trick (was: Algorithm suggestion?)
Date: Wed, 03 Apr 2002 01:55:34 GMT
I notice that in most of the discussion of Kahan's summation
algorithm people have only been experimenting and not looking
at the problem analytically. Further, even in their experiments,
they choose problems that wouldn't cause any rounding error
even if only the naive summation algorithm were used with
purely single precision calculations.
If numbers to be summed are of about the same order of magnitude
and require only a small number of bits to represent exactly, you can
sum quite a few of them with no trickery at all. For example, if you
sum randomly chosen integers between 0 (zero) and 255 into a single
precision accumlator variable, you can add at least 2**16 such values
before the rounding rule of your processor is even applied! This is
because the numbers themselves differ by at most one part in 2**8,
and single precision carries 24 bits.
Real sums tend to involve numbers that are not single-bit quantities
and/or are not all of the same magnitude or sign. Consider the problem
of adding a sequence of numbers consisting of 2**23, 2**18, -2**8, and 1.
In some order you want to add 1024 of each of these. Here there will
almost certainly be a difference between the naive algorithm using
single precision and Kahan's algorithm. Note: caution is needed in
some environments that usually promote intermediate results to extended
precision. You will see no advantage if you just compare Kahan's
algorithm to something that's *actually* using 64-bit significands. You
would have to add at least 2**40 items chosen from the above set before
any rounding takes place!
So, the following program is used to test this:
Program test
implicit none
integer, parameter :: size=2**10
integer :: i
real :: ss = 0.0, ks = 0.0
real(kind(0.0d0)) :: ds = 0.0
real :: t, y, c = 0.0 !kahan temporaries
real, parameter :: x = 2**23 + 2**18 - 2**8 + 1
! real, parameter :: x = 1
do i = 1, size
! kahan sum
y = x - c
t = ks + y
c = (t-ks)-y
ks = t
! single sum
ss = ss + x
! double sum
ds = ds + x
end do
Write(*,"(' double precision sum is: ', g16.9)") ds
Write(*,"(' single precision sum is: ', g16.9, ' in error by', f9.4, &
& '%')")ss, 100*abs(ds-ss)/ds
Write(*,"(' Kahan sum is: ', g16.9, ' in error by', f9.4, &
& '%')")ks, 100*abs(ds-ks)/ds
stop
End Program
In CVF 6.6, you must compile with /fltconsistency or the single precision
operations will use double-extended temporaries. When this is run, the
result is:
double precision sum is: 0.885810893E+10
single precision sum is: 0.885824307E+10 in error by 0.0015%
Kahan sum is: 0.885810893E+10 in error by 0.0000%
Notice that the single precision sum is off in the fifth decimal. It's actually
131 units off in the last place. That's kind of a long way to drift in only
1024 steps. Especially since the correct final answer is exactly representable
in single precision (it's 1024 time X). If you change the value of SIZE to
2**20 instead, the result is:
double precision sum is: 0.907070354E+13
single precision sum is: 0.892936703E+13 in error by 1.6%
Kahan sum is: 0.907070354E+13 in error by 0.0%
So, in just over a million elements, the single precision sum is off in the
second digit. The result from Kahan's algorithm is still exact (though
some the partial sums on the way were not even exactly representable
and Kahan's algorithm adapted to that).
Just for fun, comment out the definition of X and select the value
of 1 (one) instead. Then set SIZE to 2**26. The result is:
double precision sum is: 67108864.0
single precision sum is: 16777216.0 in error by 75.0%
Kahan sum is: 67108864.0 in error by 0.0%
The reason for this is that 1(one) is so small compared to 16777216
that you can add it as many times as you want and it doesn't change
the answer. The Kahan algorithm keeps accumulating them though
until they amount to enough for some effect. There are conditions
under which Kahan's algorithm fails (but it's always at least as good
as just plain single precision). Double precision is always at least
as good as Kahan's algorithm, and often it's much better.
So, if double is always so good, why ever use Kahan's algorithm?
Well, there are cases where double is more expensive. The new
vector instructions on many processors use single. If you want to
use them, you won't have double. Is the vector algorithm with Kahan's
modification faster than a scalar version with double? I haven't
compared. It would be an interesting thing to try (provided the
compilers become available that can correctly unroll and vectorize
Kahan's algorithm). And, there are cases when even double precision
is not adequate, but quad is *really* more expensive. Here, Kahan's
method applied to double may be just what you need.
--
J. Giles