Rounding and Bias

Another alert reader sent me a link to a YouTube video which is moderately interesting.
The video itself is really a deliberate joke, but it does demonstrate a worthwile point. It's about rounding.

The overwhelming majority of us were taught how to round decimals back in either elementary or middle school. (I don't even recall exactly when.) The rule that most of us were taught is:

  1. If the first digit after the rounding point is 0, 1, 2, 3, or 4, then round the previous digit down;
  2. If the first digit after the rounding point is 5, 6, 7, 8, or 9, then round the
    previous digit up.

Here's the problem: those rules are wrong.

The problem is that if the first digit after the rounding point is zero, you're
not really rounding - that is, you're not doing anything that changes the value of the data point. But if the first digit after the rounding point is 5,
then it's exactly halfway in-between; it's not closer to the either the rounded up value or the rounded down value - it's exactly between them. Always rounding 5 up will create a bias, because it's taking the point at the middle, and shifting it as if it were closer
towards the upward side.

To demonstrate, let's try an easy example. Suppose we've got the following set
of numbers: {0, 0.5. 1, 1.5. 2, 2.5, 3, 3.5, 4, 4.5}. Let's compute the mean
of those numbers: 22.5/10 = 2.25.

Now, let's round them off: {0, 1, 1, 2, 2, 3, 3, 4, 4, 5}; and then compute the mean: 25/10 = 2.5.

With the standard rounding rule, we've biased the numbers upwards enough to create a significant error!

The correct way to round is to randomly round 5s either up or down. The standard rule, used in most scientific settings, is to pick either odd or even as the "preferred" outcome, and to always round 5s towards the preferred outcome. If we try that with our example, using
preferred even, the rounding is {0, 0, 1, 2, 2, 2, 3, 4, 4, 4}. Taking the mean of that, we get 22/10 = 2.2 - which is significantly closer to the mean of the original numbers than the
mean rounding 5s up. The practice of rounding up adds a systematic bias to the data. It's a very small systematic bias, but it's a real one.

Does it matter? Not usually. As the commentary to the video points out, over the space of a couple of years, that systematic error in rounding gas prices amounts to about a dime. For most things in our daily experience, the difference between random rounding and upward rounding for 5s is just not significant. But if you're doing statistical analysis of
large quantities of data, or you're doing computations that rely on a high degree of
precision, then it can introduce enough error to foul your results. If you're doing statistical analysis, it can do things like make an insignificant result appear to be statistically significant. If you're doing high precision computations for things like
navigation of a space probe through a gravitational slingshot, it can introduce enough error
to crash your probe.

More like this

After my post the other day about rounding errors, I got a ton of requests to explain the idea of significant figures. That's actually a very interesting topic. The idea of significant figures is that when you're doing experimental work, you're taking measurements - and measurements always have a…
Why the long discussion about the period of a pendulum yesterday? Because we're actually going to take a look at a particular pendulum today. This one hangs in the central atrium of the George P. and Cynthia Woods Mitchell Institute for Fundamental Physics and Astronomy, which constitutes half of…
My fellow SBer Craig Hilberth at the Cheerful Oncologist writes about a meta-analysis that purports to show the positive effect of intercessory prayer. Neither Craig nor I have access to the full paper. But what we know is that the claim is that the meta-analysis shows a result of g=-0.171, p=0.…
I've been getting a lot of requests from people to talk about the recent Excel bug. For those of you who haven't heard about this, in Excel 2007, floating point calculations that should result in a number very, very close to either 65,535 or 65,536 are displaying their result as 100,000. It's only…

Mark,

Your explanation only works if you're removing exactly one significant digit when rounding (ie. written as real numbers, the 0 or 5 that gets chopped off is followed by an infinite string of 0's). If you assume that you are very likely to encounter a non-zero digit somewhere beyond the digit that you're rounding off, then lopping off a 0 is indeed (almost) always rounding down, and also the "rounded up" value is (almost) always closer to the "true" value than if you just lopped off the 5 and rounded down.

What #1 said.

By William Wallace (not verified) on 01 Mar 2009 #permalink

#1 - nonsense. By what process in the world do we produce truncated numbers like you suggest? You have this odd idea that if I take some measurement, and get 2.5 as a value, then the true value is of the form 2.5xxxxx where I just don't know what the x's happen to be (i.e. the measured value is just a truncated version of the true value). But it's not. If our instruments are good, we think the value is near 2.5. Maybe a little above or a little below. There just ain't many ways to produce data where we know all the digits are true in the truncation sense.

-kevin

Re #3.

o $1.09 rounded to the nearest dollar.
o 24-bit sample rounded to 16-bits

By William Wallace (not verified) on 01 Mar 2009 #permalink

Comments #1-3 indicate it is time for a post on significant figures. Here is the quick version:

2.5 actually means a between 2.45 and 2.55 if this is not what you want it to mean you could perhaps write 2.50 or 2.5 +/- 0.2. if you want exctly two and a half it is properly written 25 *10^1 not the lack of any decimal point makes a number exact.

WRT Gas: The pump at my station measures price to 4 sig figs and volume to 5. I think the means that it only rounds wrong one time in 200.

By AntaresTrader (not verified) on 01 Mar 2009 #permalink

"If you're doing high precision computations for things like navigation of a space probe through a gravitational slingshot, it can introduce enough error to crash your probe."
In that case maybe you shouldn't be rounding.

#6, It's the basic problem of computation. You can't store infinite length numbers on a computer, except for symbolically. The second you can't store, and calculate, everything symbolically, you have to account for the need to truncate or round. And it's not even numbers you would think need special handling. 0.1 is a classic example of a number that cannot be stored exactly using IEEE754 floating point, because .1 is a non-repeating fractional number in base 2. This is why you need to use proper numeric methods to guarantee N digits of accuracy, and this post about rounding is an example of how to reduce the error of the least significant number.

Often you have more than one non-significant digit, ie, digits you want to round away. In those cases #1 is correct. It's not that you have 2.5xxxx where you don't know x, it's that you have 2.534 and you don't care about anything after the decimal point.

Also, I would say that taking 1.0000(etc) to 1 actually IS rounding, it's just rounding with a no-op, in the same way that dividing something by one is still dividing... But that's a definitions thing...

Re #4:

$1.09 rounded to the nearest dollar? $1
24-bit sample rounded to 16-bits? my argument applies.

How about this: $1.05 rounded to the nearest dollar. Mark is right. There is no single "nearest" dollar. They are both equally near. #1 tried to imply that $1.05 really stands for a true value of $1.05+delta, with delta>=0, and therefore the "nearest" dollar should more likely be $2. This is nonsense. It is almost always going to be $1.05+/-delta for any kind of real sampling or measurements.

Just to be pedantic, I assume that when you say 1.05 to the nearest dollar you mean 1.50. But the point is that 1.50, ok, is equal, but 1.5x where x>0 is NOT equal, no matter

Yeah I assume that's what #9 meant.

#8 - I think any rounding algorithm would have to loose accuracy in general.

By Paul Carpenter (not verified) on 02 Mar 2009 #permalink

Mark is right about rounding (for the record, so am I, although it turns out I might be wrong re. gas pump rounding, but no one really knows, because depending on who you ask the machines are either much less or much more accurate than I gave them credit for in the video).

It is kind of my lifelong dream to be deemed moderately interesting by people who like math (I went so far as to write a novel about such people), so I appreciate the link and the thoughtful commentary. -John

#12: I still maintain that you and Mark are ONLY correct about (in the case of rounding to the next integer) xxxx.5 EXACTLY. if you prefer evens, and you round 4.51 down to 4, you are doing it wrong.

I agree a post on significant digits is needed. If you measure 3.52, what you know is that what you are measuring is 3.5xxx..., where 0.0xxx... is close to 0.02. (how close depends on your tool, and should be specified.)

Well, sure, but 3.51 isn't 3.5. Obviously this is only relevant if the calculation being done ends either by 0'ing out or if the calculator in question rounds wrongly.

(Example: I was taught in third grade that 3.3345 rounded to the nearest penny would be 3.34, because you have to round up the 4 and then you round up the 3, which is totally ludicrous. But I have heard--although no confirmation from the nice people at exxon--that gas pumps regularly round this way.)

By John Green (not verified) on 02 Mar 2009 #permalink

re: 15, wait, so you're saying that 3.3344444444445 gets rounded to 3.34??? that's dumb. If you were taught that in 3rd grade your 3rd grade teacher should be fired. from a cannon.

Rounding isn't a recursive process. You pick a point, and round.

If you're doing high precision computations for things like navigation of a space probe through a gravitational slingshot, it can introduce enough error to crash your probe.

Especially if readings are processed in recursive equations, where little errors can accumulate over time.

Rounding is a form of quantization. And quantization can be done in various ways (truncation, rounding, rounding toward 0, rounding toward infinity, etc.). And quantization error (quantized value - actual value) can be handled by adding noise (dither). And dither can have a Gaussian PDF, or other PDFs, e.g., triangular, depending upon the application.

Anyway, MarkCC is mostly correct, and even in the case when he is less than correct, I get his point.

By William Wallace (not verified) on 02 Mar 2009 #permalink

Thanks for all the great information, MarkCC. I always enjoy reading your blog.

I would also like to join those asking for a post about the concepts and methods regarding significant digits.

I have tried to read material about it from NIST and others in the past, but my understanding is still very low, and I would appreciate your treatment of this subject, if it's something that would interest you.

Thanks.

By RoaldFalcon (not verified) on 03 Mar 2009 #permalink