Floating-point arithmetic incorrect results

Question

Floating-point arithmetic incorrect results

GoogleCodeExporter opened this issue 9 years ago · comments

Google Code Exporter commented 9 years ago

Floating-point computations do not appear to produce the correct results. The 
results are similar, but different significantly from what is expected in a 
couple of program output listings from 1970. Suspect a rounding or 
normalization problem.

Original issue reported on code.google.com by paul.kimpel@digm.com on 11 Jun 2013 at 2:12

Google Code Exporter · Answer 1 · Sat Mar 21 2015 22:39:14 GMT+0800 (China Standard Time)

Can you show examples of an expression or three with the expected and actual 
results?

Original comment by hans.pu...@gmail.com on 15 Sep 2013 at 11:35

Google Code Exporter · Answer 2 · Sat Mar 21 2015 22:39:14 GMT+0800 (China Standard Time)

The differences do not show up in something as simple as an expression. 
I have a few XALGOL programs from my university days that attempt to 
compute characteristics of two-phase, laminar/laminar flow in a round 
pipe (I was a Chemical Engineering student). These computations involve 
orthonormalization of vectors, and are quite processor-intensive. Alas, 
the results between the original B5500 output and that of the emulator 
tend to agree, at best, to one significant digit.

I have one of the programs and its output scanned and posted on my web site:

  * http://www.digm.com/PickUp/B5500-Scans/YUSPAR-Compile.pdf
  * http://www.digm.com/PickUp/B5500-Scans/YUSPAR-Output.pdf

I have a "card deck," compiler listing, and output for the emulator also 
posted to that web site:

  * http://www.digm.com/PickUp/B5500-Scans/YUSPAR-RETRO.zip

Note that in the emulator version, I have restricted the program to only 
the first pass through the main loop (MU12=1.0 at sequence 
00410000-00420000). As it is, that truncated run takes about ten minutes 
in the emulator. A full run will probably take 2-3 hours.

For straightforward calculations, all of the tests I have tried in the 
emulator produce results that are bit-identical to the modern Unisys MCP 
systems, which use the same floating-point format. What I suspect is 
that there is a subtle rounding and/or normalization problem that 
affects the last octade of the mantissa, and that it takes repetitive 
calculations like the orthonormalization algorithm for the error to 
accumulate and the problem to become apparent. Another possibility is 
that there may be a stack adjustment problem in the emulator that is 
causing it to operate on the wrong words in the stack.

I will try to get the other listings scanned and posted this week for 
use as a comparison.

I was a very inexperienced programmer when I wrote these programs, and 
new to Algol as well. These were working studies that I just happened to 
keep over the years, and are not well documented. I doubt that the 
programs properly do what they were intended to do, but that is not the 
point. If the emulator is doing arithmetic properly, its results should 
be identical to those from the real B5500, so the goal is to find out 
where the calculations are different and to fix that in the emulator.

Original comment by paul.kimpel@digm.com on 29 Sep 2013 at 3:21

Google Code Exporter · Answer 3 · Sat Mar 21 2015 22:39:14 GMT+0800 (China Standard Time)

It appears that the known floating-point problems in the emulator are 
now resolved. I found numerous minor normalization, scaling, and 
rounding problems. After fixing those, I reran the YUSPAR test mentioned 
in the previous posting for this issue. The emulator results agree with 
those in the 1970 listing referenced in the prior post *TO THE DIGIT*. 
That is 12-place agreement between the emulator and a real B5500.

These changes will be released in version 0.14, hopefully by 2013-10-07.

After some false starts over the past months, I attacked this problem by 
writing a small XALGOL program (B55MATH) to apply 
add/subtract/multiply/divide to a variety of floating-point bit 
patterns. I then converted that program to Algol for the modern MCP 
architecture (known as "E-mode"). After running both versions, I 
compared the output and looked for differences. That revealed cases of 
interest, which I analyzed in detail using the SyllableDebugger and 
Firebug for Mozilla Firefox, which in turn pointed out where the 
problems were.

As it turned out, all of the problems were in singlePrecisionAdd(). 
Multiply and floating-divide were remarkably clean; no changes were made 
to those. Most of the differences were off-by-one rounding errors. It's 
amazing how quickly those errors accumulate in a program that does 
complex calculations.

The results between the emulator and modern E-mode are not always 
bit-identical, but in most cases the results are algebraically equal. 
For example, @0370000000000004 - @0370000000000003 yields 
@0370000000000001 in the emulator but @0231000000000000 in E-mode. Both 
values are equal, differing only in the extent to which they are 
normalized. E-mode is much more clever and thorough about how it 
normalizes and handles edge cases near the minimum and maximum values, 
which after almost 50 years of further development, you should expect it 
would. Based on the information I have on how the B5500 actually worked, 
the emulator's results appear to be correct.

There are a few cases where the results are algebraically different 
(usually by 1 in the mantissa), but all but one of these can be 
explained by limitations in the way the B5500 normalized or rounded 
results. For example, @1370000000000001 / @0154000000000000 yields 
@0000000000000000 in the emulator but @1770000002000000 in E-mode. 
Actually, this probably generates Exponent Underflow in the B5500, but 
the MCP traps that and changes the result to zero. E-mode handles this 
very small number (@177 is the smallest possible exponent, -63) more 
robustly. In the case I can't currently explain, I have no idea what the 
correct answer should be.

The YUSPARA-RETRO.zip file on my web site referenced in the prior post 
has been updated with today's tests. In addition to results for the 
YUSPAR program, there are source listings and results for both the B5500 
and E-mode versions of B55MATH, and a later version of YUSPAR named 
YUSPARA. The only original results we have for this latter program are 
from the Burroughs B6500 in 1971. That system used the same 
floating-point format, but had an entirely different hardware 
implementation of it, so some differences are to be expected. A note on 
the listing indicates that it matches B5500 results, and in general 
those results do match the current results from the emulator to the six 
digits the program reports. A scan of the original B6500 listing can be 
found at

  *

    http://www.digm.com/PickUp/B5500-Scans/YUSPARA-65.pdf

In addition, I have run Fausto's FORTRAN tests under the new version of 
the emulator. All appear to work fine. The UNDERWOOD benchmark now runs 
to completion -- before this it faulted with an Exponent Overflow. The 
results compare favorably with the output of that program from the 
modern MCP FORTRAN-77 compiler, although oddly, some of the eigenvector 
signs are reversed -- the magnitudes agree, though. That bears further 
investigation.

Original comment by paul.kimpel@digm.com on 6 Oct 2013 at 3:58

Google Code Exporter · Answer 4 · Sat Mar 21 2015 22:39:15 GMT+0800 (China Standard Time)

I misspoke about the changes being only in singlePrecisionAdd(). There was one 
small change in singlePrecisionMultiply() that cleared the X register to 
prevent rounding of the result of an integer multiply.

Original comment by paul.kimpel@digm.com on 6 Oct 2013 at 1:33

Google Code Exporter · Answer 5 · Sat Mar 21 2015 22:39:15 GMT+0800 (China Standard Time)

Fausto,

Thanks for testing this program again. I am pleased to know that you get the 
same single-precision results that I do.

We still have not implemented the double-precision operators in the emulator, 
however. Actually, DP add/subtract is written, but it has never been tested. DP 
multiply and divide are currently being simulated by the single-precision 
operators. I suspect that there is something wrong with they way I have 
"dummied" the DP operators. Finishing the DP operators is on my list of things 
to do, but is not presently at a very high priority.

-------- Original Message --------
Subject: Re: Issue 14 in retro-b5500: Floating-point arithmetic incorrect 
results
From: Fausto Saporito <fausto.saporito@gmail.com>
To: Paul Kimpel <paul.kimpel@digm.com>
Cc: retro-b5500@googlecode.com, Hans Pufal <hans@pufal.net>, Nigel Williams 
<nw@retrocomputingtasmania.com>
Date: 10/7/2013 3:04 AM
> Hello Paul,
> I updated to the latest version (0.14), but I have a strange behaviour of 
emulator about underwood program execution.
> I'm using the source files you sent several weeks ago in a zip file (along 
the output also for E-mode run).
> So, underwood-sp.job runs fine and the output is ok (without the expon 
overflow error reported in the old output).
> But underwood-dp and underwood-dp2 seem to run forever without producing any 
output... also on the console I suspect the emulator is idle... I have 92.1% P1 
slack and -1.4 P1 delay.
> Is it normal ? 
> thanks,
> Fausto

Original comment by paul.kimpel@digm.com on 7 Oct 2013 at 1:48

Changed state: Started

Google Code Exporter · Answer 6 · Sat Mar 21 2015 22:39:15 GMT+0800 (China Standard Time)

This problem appears to be completely resolved now. Closed.

Original comment by paul.kimpel@digm.com on 18 Nov 2013 at 2:07

Changed state: Verified