Wikipedia:Reference desk/Archives/Computing/2014 November 1

Computing desk
< October 31 << Oct | November | Dec >> November 2 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


November 1

edit

Python3's: adding 1 + 1 wrong

edit

>>> 27.08 + 2.93

30.009999999999998

>>> 27 + 0.08 + 0.93

28.009999999999998

>>> 0.08 + 0.93

1.01

This is not a division, so, you won't have rounding errors here. So, why does Python3 manage to calculate the last one, but not the other two? — Preceding unsigned comment added by Senteni (talkcontribs) 18:10, 1 November 2014 (UTC)[reply]

Actually it is the result of a rounding error, due to conversion between decimal fractions and floating-point binary numbers. See Floating point#Representable numbers, conversion and rounding for an explanation. AndyTheGrump (talk) 18:20, 1 November 2014 (UTC)[reply]
And if I had a series of numbers, how would I'd be able to know whether the result has a rounding error somewhere or is precise? — Preceding unsigned comment added by Senteni (talkcontribs) 18:33, 1 November 2014 (UTC)[reply]
When displaying a floating point number to the user, the programmer normally requests rounding to an appropriate amount, in this case it would be 2 digits after the decimal point. CS Miller (talk) 20:49, 1 November 2014 (UTC)[reply]
As described at IEEE floating point#Exception handling, you can theoretically use the inexact-result flag to detect this, but most programming languages don't have good support for it. I don't see any support in the Python standard library. I'm not sure it would be especially helpful anyway as you'd find that most such sums are inexact. -- BenRG (talk) 02:08, 2 November 2014 (UTC)[reply]
This page from the Python documentation explains the problem also. If you need precise decimal arithmetic (at the cost of speed), you can use the standard decimal module. -- BenRG (talk) 02:08, 2 November 2014 (UTC)[reply]
Also, math.fsum may produce more accurate results for sums of three or more terms. For example, math.fsum([27, 0.08, 0.93]) prints as 28.01 (though it's still not exactly 28.01, just the closest binary approximation to it). -- BenRG (talk) 09:06, 2 November 2014 (UTC)[reply]
As everyone knows, 1 ÷ 3 cannot be represented exactly in decimal; it's 0.333333... . So if you add 1/3 + 1/3 + 1/3 and didn't round up, you wouldn't be too surprised if you got something like 0.999999.
Now, virtually all modern computers use binary (base 2), not decimal. This includes floating-point fractions. And the key thing to know is that in binary, 1 ÷ 10 = 0.1 cannot be represented exactly. Neither can 0.01. So you never actually had those numbers 27.08 and 2.93. In binary, you had 11011.00010100011110101110 and 10.1110111000010100011110, where in both cases the last 20 bits repeat over and over forever. If I convert those back to decimal, I get something like 27.0799999 and 2.9299998. I can append more copies of the 20 repeating bits, and this has the effect of adding more 9's to the decimal representation, but with a finite number of bits, I'm never going to get all the way up to 27.08000 or 2.93000.
So, once we've worked through this, it shouldn't be too surprising that 27.08 + 2.93 comes out as 30.009999 (or 11110.000000101000111101011 in binary).
Steve Summit (talk) 19:57, 2 November 2014 (UTC)[reply]
Footnotes:
Double-precision floating point typically gives you a little more than 50 bits of precision, so we could use 2½ cycles of those 20 repeating bits, but no more.
Yes, it's true that 0.99999999... equals 1. But you need an infinite number of 9's, and here we're dealing with finite representations.
The real problem here is that a decent output routine should round off the answer to one digit less than the precision of the calculation. Let's use C++ where we have a little more control of the situation.
On my Linux box, I write this:
 #include  <stdio.h>
 int main ()
 {
   double x = 27.08 + 2.93 ;
   printf ( "%f + %f = %f\n", 27.08, 2.93, x ) ;
 }
...and I get: "27.080000 + 2.930000 = 30.010000" - which is what you probably expected. But that's only because the printf function rounds to 6 digits by default. If I change the print format to tell it to give me 15 digits, just like Python gave you:
 printf ( "%.15f + %.15f = %.15f\n", 27.08, 2.93, x ) ;
...I get "27.079999999999998 + 2.930000000000000 = 30.009999999999998".
So now I'm getting the same answer as Python gives - but look at the first two numbers! I told it to print 27.08 but it actually printed 27.0799999999999998 - so you can see that you can get floating point errors without doing any math at all! The problem isn't (just) in the arithmetic. The problem is that you can't store 27.08 as a binary number with a finite number of bits. So given that converting the decimal number into binary in the first place had error, you can see that the addition process was 100% accurate in this case.
However, that was more by luck. If I change the sum by adding a million to the first number:
 1000027.08 + 2.93
I get "1000027.079999999958090 + 2.930000000000000 = 1000030.010000000009313". This time, it's clear that the addition process did introduce some error. That's because the number of significant digits that the machine can hold is the same no matter the size of the number (well, within reason). The size of the number determines the precision of the result.
I'm a little horrified that Python gives you ALL of the digits rather than rounding off the last digit by default...but then I'm not really a fan of Python.
SteveBaker (talk) 16:00, 3 November 2014 (UTC)[reply]
I mentioned this thread to a friend, and he pointed out that we're not really doing "not a division" or "not any math at all". When we write "29.08" we're basically asking for "29 + 8/100" and hey, lookit that, an addition and a division! —Steve Summit (talk) 16:54, 3 November 2014 (UTC)[reply]
Well, theoretically - but at least in a compiled language, that ASCII string "29.08" can be converted to binary with higher precision than the underlying machine hardware. So there is no reason why a constant cannot be stored to the fullest precision of the underlying hardware without roundoff error in the actual conversion process. And even the arithmetic between constants can (in principle) be done at compile time at higher precision than the underlying hardware provides at runtime. You can also write code to parse an ASCII string into a floating point number without using division. So it's tough to speculate on what happens with compiled constants...except to say that they can't be any more precise than the underlying binary format of the number. SteveBaker (talk) 20:59, 3 November 2014 (UTC)[reply]
As I understand it, the approach in (current versions) of Python is to choose the shortest representation which results in the same binary representation of the number, when converted from a string back into a float [1]. You can check this with the float.hex() function: float.hex(27.08 + 2.93) == '0x1.e028f5c28f5c2p+4'; float.hex(30.009999999999998) == '0x1.e028f5c28f5c2p+4'; float.hex(30.01) == '0x1.e028f5c28f5c3p+4' - Their binary representations are different, so Python outputs a different representation, as they're different numbers internally. Python has no way of knowing where the number comes from, so it has no way to know how you wanted it rounded. By default, it keeps all the precision - handy, as converting to and from strings is a standard way of serializing data in Python. If you do want it rounded, there are ways of explicitly specifying how you want it rounded: e.g."{:.2f}".format( 27.08 + 2.93 ) for having two digits past the decimal point [2]. Besides, you're going to have to learn about floating point issues eventually, otherwise you'll wonder why in "a = 27.08 + 2.93; b = 30.01; c = a - b" that c != 0, even though "a" and "b" print out as the same number. (I'll point out that in older versions of python, like Python2, the behavior was slightly different. Python has two general functions for converting an object to a string - str() and repr(). The first is intended for use in display purposes, whereas the second is for representing internal state. In Python2, repr() gave the precise representation for floats, whereas str() rounded to 12 decimal digits (out of the ~17 available). So numbers would print differently depending on whether they were printed directly at the command prompt (which used repr) or by the print statement (which used str), or as an internal element in another object (which used whatever the object chose to use). The Python developers thought this was sub-optimal, so when they implemented the shortest-string representation approach, they made str() and repr() function identically. [3] and links therein.) -- 160.129.138.186 (talk) 16:02, 4 November 2014 (UTC)[reply]
To be fair, one "could" store 27.08 exactly on a computer, for instance (one of many, probably infinite, possible options), one could use a character array and store '2' in position 0, '7' in position 1, '.' in position 2, '0' in position 3, and '8' in position 4. Doing arithmetic on a number stored in such a manner would certainly be less efficient than with a floating point number or integer (partly because floating point numbers have special hardware to do arithmetic on them very fast; if my putative character array data structure was the industry standard it would presumably also have hardware acceleration, though it probably wouldn't be as fast as floating point is), but it could certainly be done. 75.140.88.172 (talk) 07:02, 6 November 2014 (UTC)[reply]