The Excel 65,535=100,000 Bug

I've been getting a lot of requests from people to talk about the recent Excel bug. For those of
you who haven't heard about this, in Excel 2007, floating point calculations that should result in
a number very, very close to either 65,535 or 65,536 are displaying their result as 100,000.
It's only in the display though - the underlying number is actually represented correctly, so if you subtract 2 from 65,536, you'll get the correct answer of 65,534 - not 99,998.

I can't give you the specifics - because without seeing the Excel code, I can't tell exactly what they got wrong. But I've got some pretty good suspicions, so I'll do my best to explain the background that leads to the problem, and what I think is probably going on. (Another excellent explanation if this
is in the Wolfram blog post that I mentioned in my fast arithmetic fractals post this weekend.)

When you're working with numbers on in a program like excel, you're using something called floating point numbers. Floating point is annoying, seriously annoying. It's an approximation of real numbers using a finite precision in a way that allows arithmetic operations to be done reasonably quickly. (It's still a whole lot slower than working with integers, but it's usually pretty good.)

Without going into the gory details, what floating point means is that the number is represented by a fixed-precision number in a strange form of scientific notation. So, for example, in 4-digit base-10 floating point, the number 600 is actually represented as 0.6000×103; 31.4159 is represented as 0.3142×103. Notice what happened in that second case - we lost two digits, because the 4-digit float representation didn't have enough precision to represent all of the digits.

To make matters worse, floating point doesn't using base-10. It's binary: so numbers are really represented by a base-2 fraction and a base-2 exponent. So, for example, 1/4 is really represented as
0.1×2-1.

There are a couple of unfortunate side-effects of this:

  1. The one which most people have seen in some form,
    is that almost every floating point computation accumulates errors: because of the finite precision,
    every step performs a round-off, which can add a bit of error. The more you do, the larger the accumulated effects of roundoff errors can become. The typical example of this is on many calculators, if you do 2 - (sqrt(2)^2), you'll get something odd like 0.00000000012.
  2. The order in which you perform a computation can have a huge impact on the precision of the result.
    For example, in base-10 4 digit floating point, with 1-digit exponents, if you have (0.1E-9)×(0.3E0)×(0.4E8), if you do the first pair of numbers first, it evaluates to
    0E0×0.4E8=0. If you do it in the second order, it evaluates to 0.1E-9×0.12E8=0.12E-2.
  3. In order to display a value to a user, you need to convert it from base-2 scientific notation
    to a standard base-10 representation. There are a bunch of problems here: doing the conversion is
    potentially quite slow, involving multiple divisions (with the attendant round-off errors); and
    you want to present the number to the user correctly; while the base-10 conversion might result in
    1.999999999999999999999999999999999999, you probably want to output that as 2.

The last is the problem in Excel. Converting to print format is very slow - so people devote enormous
amounts of effort to finding algorithms that make it faster. Every corner that can reasonably be cut
in order to save a bit of time is cut. At the same time, you want to get the number out correctly - so along with doing your clever optimization, you're always keeping your eyes open for one of the cases
where the round-off errors should be corrected - like rounding 65,534.999999 to 65,535.

The problem with that is that floating point representation
is very complicated, and there are an incredible number of different numbers that can be represented.
It's very easy to write code which seems correct, and which works perfectly on nearly every
floating point value, but which contains an error which will manifest on a dozen bit patterns.
You can test that code on billions of different values, and still miss the crucial couple that reveal
the problem.

It looks like the Excel bug is one of those cases. My suspicion is that there's an unfortunate interaction between the code that tries to prevent generating numbers like 1.99999999999999999 instead of 2, and the code that does the optimized conversion. What it looks like is that the rounding-error correction is probably over-doing its job; it's doing the roundoff and presenting its result back
to the output code in what looks like it's already base-10 format.

In general, I like ragging on Microsoft as much (if not more than) you average Mac user. But I can't say that I really blame them too much for this one. Floating point arithmetic and conversion is a nightmare - it's enormously complex, and the demand for speed is extraordinary. Slowing things down a tiny bit - taking an extra microsecond per conversion - can have a huge impact on the performance of the system. That kind of code is under constant pressure to squeeze out every last drop of performance. And errors like this are so easy to miss, while catching them
in testing is almost impossible. You can only reliably catch this kind of problem by doing a detailed analysis of the logic of the code, and all you need to do is miss one out of hundreds of different corner cases, and you're hosed. It's just so hard to get right that the only surprise is that they've made so few mistakes like this.

More like this

After my post the other day about rounding errors, I got a ton of requests to explain the idea of significant figures. That's actually a very interesting topic. The idea of significant figures is that when you're doing experimental work, you're taking measurements - and measurements always have a…
As pointed out by a commenter, there are some really surprising places where fractal patterns can appear. For example, there was a recent post on the Wolfram mathematica blog by the engineer who writes the unlimited precision integer arithmetic code. Unlimited precision integers are numbers…
Some parts of the discussion of Oh dear, oh dear, oh dear: chaos, weather and climate confuses denialists have turned into discussions of (bit) reproducibility of GCM code. mt has a post on this at P3 which he linked to, and I commented there, but most of the comments continued here. So its worth…
Another alert reader sent me a link to a YouTube video which is moderately interesting. The video itself is really a deliberate joke, but it does demonstrate a worthwile point. It's about rounding. The overwhelming majority of us were taught how to round decimals back in either elementary or…

Joel over at http://www.joelonsoftware.com/ worked on Excel about 13 years ago and also has an explanation. (basically paralleling yours) Interesting to note that in the Oracle database software they store numbers in base 100 so they don't have binary to decimal conversion problems.(up to 38 digits of precision) I think in their case they were more concerned with space the numbers take up on disk than performance of mathematical operations. They do have an option for columns to store a number in IEEE format for reduced precision, but potentially faster math operations.

I would expect the code that converts back to decimal for display to be part of the compiler, and not anything that the Excel programmers would touch. If so, it isn't Excel's source code that has the bug, but the compiler's--which might put this bug into other products.

Am I thinking about this the wrong way? Do application programmers really work at such a low level that they're messing with floating point display code?

The math behind it seems to work in every case except 1.

If A1 contains "=(850*77.1) + 1" the displayed value will be 100,001

Although adding to that will then put you on the right path again.

In the case of Oracle, I'm quite certain they used BCD (Binary Coded Decimal - two decimal digits per byte) representations because they don't want to piss off the accountants who are their main marketplace. Accountants don't care for the large dynamic range that floating point gives you. They want to be *certain* they don't drop pennies.

Dr. Drang:

You'd be wrong in thinking that. :-)

For basic IO, you're working under the expectation that the execution time is completely bounded by IO - that is, you're going to spend so long in processor cycles waiting on the IO handler that you don't need to worry about computation cycles: making floating point IO faster won't accomplish anything except making your processor spend a few more cycles idling waiting on IO. So basic IO for floating point numbers in compilers is not highly optimized.

Second, most console IO doesn't spend a lot of time worrying about things like best-possible-roundoffs. The usual expectation is that if you need to worry about that, you'll do it yourself, in a way which is best suited towards your particular application.

Finally, there's so much effort that goes in to these highly optimized routines that most companies don't just want to give it away to the competition. They're not going to toss their super-optimized, hand-crafted, whiz-bang numeric code into the package with the compiler. If they share it at all, it's a high-price add-on.

Andrew:

The Excel bug manifests on two small clusters of values: one very close to 65,535 (which displays as 100,000), and one very close to 65,536 (which displays as 100,001). So when you add 1 to something in the first error group, it pushes it into the second error group. Add one again, and you're outside the range of either error.

I think in my first CS class we were told to test boundary values. Even though there are billions of numbers that could be tested, the ones it seems like you should definitely be testing are powers of 2 -- and of the powers of 2 you should at least be testing 8,16,24,32,64, etc. Well, easy to armchair quarterback the testing regime.

'[Working with floating point numbers is] still a whole lot slower than working with integers, but it's usually pretty good.'

I disagree with this one. A properly implemented hardware multiplier for FP might actually be a tiny bit faster than an integer multiplier for the same number of bits, for the simple reason that the significand, which is the part that actually needs to be multiplied, has fewer bits. An FP adder will be pretty close in performance to an integer adder of the same number of bits. Same should be true of a divider as well.

On a different matter, considering how Excel is primarily used for businessy stuff, perhaps it might have been better to use base-10 floating points. Of course that would've been terribly slow 10-15 years ago, but we've got pretty fast computers now.

One thing of potential interest is that 65536 is 2^18. I am not in any way a computer programer, but I am wondering if the behavior may have something to do with the negative bit or some other type of issue that arises when a register overflows.

IEEE 754, the standard for most hardware-based floating point, is undergoing revision and a decimal floating point is in the works. I know a number of software vendors who have announced or are about to announce support for this.

Flaky, 10 to 15 years ago, there was a computer OS called PC/GEOS (aka GeoWorks) whose display engine was 20 years ahead of its time. Its coordinate space was measured exclusively in points (1/72"), and fractions of a point were possible. All graphics operations were vector operations -- ALL of them. As a result, by applying a matrix transform, you could get any kind of skewing or rotation you desired, for any graphics object at all.

On an 80286. In real time. Using _rational_ number representation.

It is my opinion that floating point offers only limited value in most domains (it is of greatest value to science applications). I think that if we really were interested in optimizing hardware resources for what most people used them for, we'd have gone instead with hardware support for rational arithmetic. Where extended amounts of precision or range were needed, hardware acceleration for bignums would work.

It seems to me that floating point numbers are more trouble than they're worth, with or without an FPU.

By Samuel A. Falvo II (not verified) on 02 Oct 2007 #permalink

Falvo: Rationals are probably underused, but I'd be rather wary to use them in simulations. Some sort of rounding would be inevitable to prevent the numbers from growing in size without bounds, even if they were always stored with least denominator. Floating point performance and memory usage is more predictable.

I've still got some GeoWorks disks around here. Damn, that was a nice little system. The problem was that practically no one could write software for it. Development required a Sun workstation (then something like $10,000). I think it could give you a tty-type interface for your modem, but didn't really support BBSing. But it was a lovely GUI, and insanely fast. As Samuel Falvo says above, it ran very nicely on an 80286, and IIRC, even on an 8086.

The problem compounds with some operations.

ROUND(850*77.1,1)*2 yields 200,000.

Per rationals and fixed point numbers---they're great if everything you're doing is within a few orders of magnitude, like you would with display graphics (nothing really matters terribly much below about 0.1pt and above 1,000pts or so when you're outputting to something screen or paper sized. Scientific simulations routinely produce things over a much wider range, especially with repeated operations. Even if the inputs and outputs are close in size, intermediate values can get huge or tiny.

We ran into a more fundamental problem with number representations in a Dynamical Systems class once. The function f(x) = { x, 0<=x<1/2; x-1/2, 1/2<=x<=1 } (looks like a sawtooth pattern with two teeth) when iterated has points with orbits of every period, but when you simulate it on a computer, everything eventually just goes to zero and stays there.

It turns out that set of numbers which are equal to sums of powers of two is exactly the set of numbers whose orbits become eventually fixed at zero (i.e. under iteration eventually obtain the value zero, which is a fixed point). So a computer using ordinary floating or fixed point numbers could only represent values with rather atypical behavior.

By Matthew L. (not verified) on 02 Oct 2007 #permalink

While I have no reason to doubt you here Mark, something smells mightily more fishy than MS failing to test some freakishly, almost chaotic-ly unknowable bit pattern that doesn't display correctly. I mean c'mon, the numbers in question are exactly 32K, and 32K - 1. Something's hinky.

I'm suprised Excel doesn't use the GPU for calculation acceleration. NVidia/AMD GPUs make AltiVec seem slow.

GPU,

Excell probably doesn't use GPU hardware because GPGPU is only available in current generation GeForce and the new Tesla devices (I don't know what GPGPU platforms AMD/ATI has available), probably not worth implementing for the markets they appeal to yet.

Your example #2 about floating point multiplication is wrong. It would be closer if it were an addition/subtraction example, but it's unnecessary align exponents before multiplying. .1e-9 and .3e0 can safely be multiplied to .3e-11 with no difficulty as long as there are two digits of mantissa and exponents up to +/-11 available. Now if you subtracted .1e-9 from .3e0, then alignment would be required and you could get a difference in result based on order.

Multiplication can be done without prior alignment because scientific notation is multiplicative, and, as long as the inputs are normalized, you can multiply the mantissas and add the exponents safely (assuming you have more than 2 or 3 bits in the mantissa and you don't overflow the exponent). Then you can normalize the result. There's still room for some loss of precision, but not so drastically as you describe.

You can use the following C program to convince yourself:

#include

int main()
{
double a=.1e-200;
double b=.3e0;
double c=a*b;
double d=b-a;
double e=.1e-15;
double f=b-e;
printf ("%26.16g\n%26.16g\n%26.16g\n%26.16g\n%26.16g\n%26.16g\n",a,b,c,d,e,f);
return(0);
}

billb:

In that example, I said I was using single digit exponents.

Doh!

Sorry, Marc. It was one of those days that involved explaining exactly this sort of stuff to people that should know it, and I hadn't eaten at all. Mea culpa!

Of course, I think the subtraction issue is much more interesting! :)

You can get away with a lot when multiplying. Although your example shows a clear case of when it fails because the exponent overflows negative, the problems with subtraction, I find, are more pervasive and insidious. With multiplication you have to be working very near the boundaries of what you can represent in absolute terms. I.e., your .1e-9 is the smallest number that your (1,1)-digit floating-point system can represent, so multiplying it by something that would diminish it further is equivalent to asking for zero (and should raise an exception automatically on architectures that support it). To get into trouble, but with subtraction, you only need to be working with numbers that differ by more that what the mantissa can represent, and no exception can or will be raised on most architectures.

It's very sneaky.

Thanks for pointing out my error.

Bill.

GPU,

I don't think it's feasible to speed up spreadsheet execution with GPU hardware.

First, current GPUs implement only single-precision floating point; this would be much less accurate than the double-precision that a typical spreadsheet would use.

Second, GPUs are blindingly fast only when they are running (effectively) highly vectorized code; that is, you need to be doing the same thing to many different values. While this may happen in a spreadsheet, it would require a lot of compiler-like effort to find such vectorizable operations.

Third, I would be surprised if floating-point computation were the bottleneck in a typical spreadsheet anyway. Unless the spreadsheet compiles formulas into machine code, I would expect the formula interpreter to be the slowest part.

Carl

By Carl Witty (not verified) on 03 Oct 2007 #permalink

Slowing things down a tiny bit - taking an extra microsecond per conversion - can have a huge impact on the performance of the system.

I don't see why this needs to be the case. We're talking about a formatting conversion for display purposes only. That means:

  1. for each number, we still need to do the work of painting that number on the screen
  2. we only need to update the base 10-formatted version of the number when the number's value actually changes, which typically doesn't happen very often

So why is performance such a critical issue here?

Neil: Wouldn't it be the case though that, at least in a spreadsheet program, when one number's value changes it could potentially result in the display value of every single other cell in the document changing as well? The time to figure out the display value could potentially stack up there.

For the reasons Neil mentions, it seems very unlikely to me that Excel does all that much weird stuff to optimize base-10 formatting. Especially as this bug is apparently new in the latest version of Excel; I can believe that MS might have done all sorts of weird overoptimization in old versions back in the Stone Age when the processor had to do its calculations in Roman numerals with a piece of rock and a chisel, but in 2007? (Then again, it seems pretty clear that this *is* a formatting problem, so I dunno. Perhaps the same formatting code is used in other contexts where its speed actually matters.)

And perhaps I'm misunderstanding, but when Mark refers to "the code that tries to prevent generating numbers like 1.999...9 instead of 2", is he suggesting that Excel changes the actual numbers stored in the cells to make formatting easier? That would be (1) insane and (2) inconsistent with the symptoms I've seen reported, which seem to indicate that this is *only* a formatting problem; if you put one of the 12 evil numbers into cell A1 and A1+2 into cell A2, then A2 will show up just fine.

(The 12 evil numbers are 65535-h and 65536-h where h is the sum of one or two of 2-a, 2-a-1, 2-a-2 for some a that I forget. 32 or 35 or something.)

The speed issue is real. Remember that the way that things like Excel word, they generate the contents of every cell in the spreadsheet, in print form, each time they recalculate, and recalculation is supposed to be non-noticeable in the normal case. Since the conversion from binary to display involves multiple divisions, it's quite slow without optimization, and spreadsheets where thousands or tens of thousands of values need to be recalculated are not unknown.

The "fixing up" of numbers - making 1.99999999 print as "2" is just part of displaying the number. It doesn't actually change the storage value of the number - just the display representation. It's actually a very interesting problem, getting the right display of a number. There's a great paper by Guy Steele in, I think, SIGPLAN 1990 about an algorithm for it.

Finally, it's mostly a display bug - but not entirely. There are some functions in Excel that operate on the print form of a number, rather than the internal representation. The most widely cited one is round - which does a base-10 round of the number to a certain number of decimal places. That's done on the print form.

You guys are missing the point. You've gone to great lengths to explain what could cause Excel to mishandle a simple multiplication. Whether it's a botched algorithm or a flawed display mechanism doesn't matter to customers. It should matter less to computer scientists or mathematicians like yourselves (I'm neither of both). Because I've been watching too many episodes of "CSI: Miami", I'll impersonate Horatio Caine for a moment and look at the evidence in this case.
Previous versions of Excel didn't display the behaviour in question, neither do current competing programs (Corel Quattro Pro, OpenOffice Calc, Gnumeric, ...). Obviously, it's not about finding the Grand Theory of Unification here.
Running on a Pentium II, Excel 97 didn't suffer from a blurred vision. Why should floating point numbers give fits to Excel 2007 running on an Intel Quad Core 2 Duo ? Therefore, the complexity of numerical analysis isn't a good alibi either.
Excel is a financial tool widely used at office and at home. Customers don't expect a spreadsheet program to perform a vanishing act (a la David Copperfield) with their data. They literally trust Microsoft with their financial safety, which leads me to the next piece of evidence.
During an interview given to the ChannelWeb Network on July 20, 2006, Chris Liddell (Microsoft Chief Financial Officer) said they would spend $ 900 million to market Windows Vista, Office 2007 and the Xbox 360. With $ 300 million, I think many people could have been hired just to make sure that the new features of Office 2007 actually work.
In light of the evidence, I believe that the testimony of computer scientists is not required in this case. Finally,
Your Honour, as a lead investigator, it is my conclusion that this disaster could have been avoided if the defendant (Microsoft) had simply brought to bear its vast financial and human ressources.
Oh boy, talk about an impersonation !

Kevin,

You say we are missing the point, and I guess I'd have to agree that I really could not figure out what point you were trying to make. Are you saying that they didn't throw enough money at it?

And as I recall, there were previous versions of Excel that also had some pretty famous calculation bugs that only cropped up in very special cases.

A1 = 850*77,1
A1+1 = 100 001

#29:
A1: = 850*77.1
A2: =A1+1 --> 100 001

So? Now try A3: =A2+1
What do you get?

LOL Steve,

You reminded (not that it is related) me of a funny thing I saw in first year calculus. We were using Newton's method to find the roots of a polynomial.

The professor showed us a strange case where after three iterations, the process looped back to it's starting point. I have forgotten the exact equation, but it was pretty cool.

I guess it is glancingly related as it shows something as fundamental as Newton's algorithm can have strange cases where weird things occur.

If Excel really calculates the print form for every cell whenever it's calculated, then there's a much easier and more effective optimization available than resorting to bit-diddling in the formatting code, namely calculating the print form lazily when it's required. Since the Excel team are demonstrably not idiots, I'm sure they are aware of this.

(Back of envelope #1: suppose you have a nice large screen showing 25 columns and 60 rows, and you haven't converted any of the values to displayable form yet. That's 1500 cells. And suppose you have a really naive formatting algorithm that has to do 30 FP divisions at 100 cycles each. (I think they're actually quicker than that, but no matter.) Then that's 4.5ms, and you only have to do that when the user triggers a recalc or changes the region of the screen to display. Back of envelope #2: surely, surely, actually displaying the characters once you've worked out what they are is going to be much more expensive than converting numbers to characters.)

Steele's paper is nice. There are some other related ones, mostly from the Scheme community, that IIRC improve on its efficiency and robustness in various ways.

Ben, Newton's algorithm is notoriously finicky when you start it somewhere that isn't near enough to the root you're looking for. If it ever hits a point where the derivative of the thing whose roots you're looking for is zero or very close to it, then all bets are off. (If it's actually 0 then you divide by zero; if it's just very small, your next iteration can be miles away.) This behaviour is closely related to some rather pretty fractals you can draw based on Newton's algorithm.

G:

First of all, Excel has the property that some functions work on the display form of the number, and some work on the internal form of the number. As a result, you need to either implement lazy evaluation; add some static analysis; or always generate display forms.

Second, one of the most notorious flaws among programmers is inappropriate optimization. People often focus on what seems to be a problem without necessarily doing a proper engineering analysis. If you do a basic profiling run on a spreadsheet, you'll see a large segment of time spent on calls to display routines; for many people, the immediate response to that is "We need to optimize the display routines", not "We need to add lazy evaluation to reduce the number of calls to the display routine".

Third: what would your reaction be to a spreadsheet program that generated calculation errors not when you asked it to recalculate, but when you dragged the scrollbar? Since some calculations rely on the print form, if you don't calculate those until you need them, then you can get into the classic lazy evaluation problem of delayed errors. Cell X depends on cell Y; Y divides by the display value of Z; Z = 0.000000000001, which is a roundoff error for 0; the display value of Z is 0, so when you scroll to where you can see X, suddenly you get a divide by zero error in cell Y.

Things like spreadsheets are really nasty things to program. They're simultaneously a rather sophisticated programming language, *and* a program used by naive users. Striking that balance is incredibly hard - you don't want to surprise the poor manager sitting in his office looking at a spreadsheet, but you also need to be able to program it to do very complicated stuff. So you do get unfortunate tradeoffs, like not doing certain kinds of optimizations on the user-written spreadsheet code, because that would create problems for your users.

Yes, I know that inappropriate optimization is a common flaw. I've done it myself from time to time. But -- perhaps I'm being terribly naive here -- I have the impression that the Excel guys are exceptionally smart, so I'd be a bit surprised if they made a really dumb decision here. Then again, that impression of smartness is somewhat dented by this bug, whatever its cause. I'll remark that lazy evaluation is so similar to the dataflow stuff a spreadsheet is doing anyway (the two are dual, so to speak -- the arrows are reversed) that it's just the sort of thing I'd expect people working on a spreadsheet to think of easily.

I don't understand your example of delayed error. The divide-by-0 error will show up in cell Y as soon as cell Y is visible. It seems to me that if you do the lazy-evaluation thing in the obvious way, users provably never see delayed errors. (It would be different if dividing by zero popped up an error dialog, of course, but for obvious reasons spreadsheets don't do that.)

Yes, the error will show up in cell Y as soon as it's visible. But you scrolled away from cell Y to look at cell X, then saved the spreadsheet without looking at Y again, and took it into a presentation. Then you got fired. Whoops.

By Stephen Wells (not verified) on 08 Oct 2007 #permalink

G:

The delayed evaluation error is when Y has a divide by zero error, but it occurs not when you scroll Y onto the screen, but when you scroll some other cell Z onto the screen, where Z is computed by equation from Y. So
you recalculate your spreadsheet, and nothing is wrong, until you scroll, and there are no recalculation errors until you scroll Z onto the screen, at which point you find out about the error in Y.

Again, you need to remember that Excel is a weird hybrid beast - there are Visual Basic scripts triggered off of cell value changes, and exception handling scripts, etc. When you delay a computation, you delay the errors, and all of the things that those errors should trigger. So things can look good until you move the scrollbar, at which point all sorts
of hell happens.

I'm not saying that the Excel guys are stupid - far from it. What I am saying is that:

(1) Even without some of the delayed execution effects, lazy computation isn't necessarily an obvious choice for a spreadsheet.

(2) The real issues of building something like a spreadsheet cause a lot of difficult tradeoff decisions. What seems obvious from the outside might be a terrible choice from the inside, because of the structure and complexity of the code.

(3) Even if the structure of the code weren't an issue, even
if the lazy execution did occur to the developers, it can create complicated issues from the point of view of users/scripters, which might preclude its use even if it were the best way to optimize.

(4) Even the very best hotshot programmers don't always make the best decisions about optimization.

(5) Even when programmers make the best decisions, that doesn't mean that managers go along with it. Managers are far more likely to prefer "Optimize that routine that's slow according to the profile" to "re-organize the code to a different evaluation strategy that will be more efficient.

Yes, I understand what your delayed evaluation scenario *is*, I just don't understand why you think it works the way you say it does. The only way you find out about an error in Y is when Y is on the screen. (Because that's how errors get reported in Excel.) So if you find out about the error for the first time when you scroll Z onto the screen, it can only be because either (a) Y scrolled on along with Z, or (b) Y was already there. In case (a), this is exactly the expected behaviour: you see that Y is broken when Y comes into view. In case (b), Y was already on the screen; therefore Y had already been formatted; therefore the error had already shown up, contrary to our hypothesis that it first appeared when Z scrolled onto the screen.

I do understand that Excel is a weird hybrid beast, and does all sorts of odd stuff, and it's possible that there are delayed-evaluation scenarios involving scripting that *do* have the sort of problem you're talking about, and maybe *those* would suffice to make the Excel implementors decide to prefer crazy microoptimization of their display routines over lazy formatting.

But leaving scripting aside, the very definition of lazy formatting is that as soon as anything can affect what's actually visible to the user it gets done. So your original calculation-and-formatting-only example can't be right, and nor can Stephen Wells's version of it.

For the avoidance of doubt: I agree that this is surely some sort of bug in the conversion from numbers to characters. All I'm not buying is (1) the theory that there's some sort of preliminary "code that tries to prevent generating numbers like 1.99999999999999999 instead of 2" that's separate from the "code that does the optimized conversion", and (2) the theory that for the 2007 version the Excel team suddenly decided to write some weirdo hyperoptimized bit-level floating-point formatting routines. If the problem is broken formatting optimization, then I bet it's not Excel-specific.

For those interested in all the gritty details of how this bug came to be, how it works, and how the fix works, I posted a detailed analysis based on disassembling Excel at
http://www.lomont.org/Math/Papers/2007/Excel2007/Excel2007Bug.pdf

I show why precisely the 12 values format incorrectly, what changed from earlier versions, and how the hotfix corrects the bug.

For those who don't want to wade through my disassembly and analysis, the gist is an extraordinary combination of bit pattern bugs introduced when the Excel engineers rewrote the old 16-bit formatting code to a (presumably) faster 32-bit routine.

Grab it while it's hot!

Chris Lomont

Magnificent! And it looks like this code is in fact entirely specific to Excel, so I was dead wrong on #2 above. How naive of me.

I just tried to replicate this amusing bug but my version of Excel-2007 gave the correct answer unfortunately.
I set A1 to 65531,2 and A2 to 3,8. A3 was A1+A2.
The answer was 65535. When I changed A2 to 4,8 the answer became 65536 as it should.
My Excel has been installed only a few weeks ago, perhaps the bug had been corrected in the meantime.
Bad luck!

Kind regards,

Rob (Amsterdam).

Regarding some earlier comments about the superiority in certain respects of rational / fixed-point arithmetic, and the relative desirability of hardware implementations of rational vs. floating-point arithmetic:

The most common processor-constrained task performed by the vast majority of consumer computers is advanced simulation (in the form of games), something for which floating-point arithmetic (for all its many warts) has significant advantages.

By Douglas McClean (not verified) on 04 Aug 2008 #permalink