Why is float inaccurate

2022.01.07 19:39

The new stored value might get lucky and convert back to the correct representation. It might not. It's this "might" thing that makes this so hard. I can run tests on numbers thinking I got back what I put in so it must store exactly only to have my heart broken when I do math with it.

On their own 1. It just doesn't happen to convert the way we hoped. Other math can put us at this same stored value that expects exactly this representation. There's no perfect way to round it. We simply don't know any more at this point than what was stored. Because we got lucky. The information that was lost in conversion just happened to not be enough to matter once it was converted back. Now you can throw a lot of math and diagrams at this problem trying to explain how floats are stored but the thing to teach is how to know which numbers can't be stored perfectly.

Because the computer isn't going to tell you. It's the students expectation of how floats will behave that you want to cultivate. They shouldn't expect them to behave like calculators. They shouldn't expect them to be useless. They should expect if you can't get to a number by addition and dividing by 2 that the numbers are going to be fuzzy. That's fine for scientific measurements that just want to be close.

Not good for accounting that cares as much about pennies as dollars. A base 10 currency won't always store precisely. Though if you can find a currency that is always divided by 2 then floats will keep your accountant happy.

Show them a ruler with inches on it. Floats can store inch fractions perfectly. This should debunk the myth that floats have rounding errors.

Entering these into a computer and doing math on them is a good exercise because it forces the student to do the conversion by hand. These conversions will all happen without discrepancies. You can even challenge the students to try to catch the computer doing inch fraction math wrong.

I used to teach this with drafting rulers. You can get one that has decimal inches. Put the decimal inch ruler next to the fraction inch ruler. Show them how some numbers match up but others don't. The decimal inch ruler represents what you see and what you type. The fraction inch ruler represents what is stored. You can even get students to reproduce the same kinds of errors by having them take measurements of values from the decimal ruler on the fraction ruler and trying to convert back.

They'll quickly notice that some convert back easily while others don't. Well, as long as you don't let them simply remember how the story started. You can find links to other resources in my earlier posts here. Therefore, not all real numbers can be represented exactly. For example, if you take the smallest positive number that can be represented exactly and divide it by 2, you can't represent it exactly by definition.

Explain that floating-point numbers are represented as the sum of powers of 2. They will not be able to. As a student myself, I had a course about how processing and encoding works where we had a detailed part about operations in different bases and how floats and doubles differ from one another by the number of bytes allocated, and how the computer's different ways of encoding floating numbers can be done by hand to get a grasp of it, but this wasn't until my third year, after having a first introduction to operations in different bases during my second year.

Only basic understanding of numbers in different bases and knowledge of the fact that floats are coded on a given number of bits is necessary to understand why it may be a problem, you could use the example of scientific calculations, where precision is important to make students get the stakes.

But if you want students to really understand it fully, I think training operations in different bases and making them try by hand floating numbers coding and rounding may be necessary to get how frequent approximations are and how trivial calculations can give an answer different than what's expected, but more time consuming. Sign up to join this community. Binary representation of floating-point values affects the precision and accuracy of floating-point calculations.

You need to include float. Because this is a very small number, you should employ user-defined tolerance for calculations involving very large numbers. Optimizing Your Code. Feedback will be sent to Microsoft: By pressing the submit button, your feedback will be used to improve Microsoft products and services. Privacy policy. Skip to main content. This browser is no longer supported. Because 9. The exponent is stored in the middle component as 11 bits.

In our case, 0b In decimal, that represents the value A quirk of this component is that you must subtract a number equal to 2 of bits - 1 - 1 to get the true exponent; in our case, that means subtracting 0b decimal number to get the true exponent, 0b decimal number 3. The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:.

The mantissa would be the 6. Recall that the mantissa in scientific notation always begins with a single non-zero digit.

The same holds true for binary, except that binary only has two digits: 0 and 1. So the binary mantissa always starts with 1! When a float is stored, the 1 at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:.

This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point. When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of In binary, we can do the same thing by multiplying or dividing by powers of 2.

Since our third element has 52 bits, we divide it by 2 52 to move it 52 places to the right:. In decimal notation, that's the same as dividing by to get 0. Now that we've transformed the third component into a fractional number, adding 1 gives the true mantissa. And multiply to reveal the final representation of the number we started with 9. Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces. This isn't a full answer mhlester already covered a lot of good ground I won't duplicate , but I would like to stress how much the representation of a number depends on the base you are working in.

In fact, the second representation isn't even rounded correctly! Nevertheless, we don't have a problem with 0. Yes, in some programs it matters. In other words, we have an exact, finite representation for the same number by switching bases!

The take-away is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.

It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.

Because often-times, they are approximating rationals that cannot be represented finitely in base 2 the digits repeat , and in general they are approximating real possibly irrational numbers which may not be representable in finitely many digits in any base. It is impossible to represent irrational numbers e. And that actually is why they are called irrational.

No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision. Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. All your arithmetic would have to be done on fractions just like in highschool math e.

But of course you would still run into the same kind of trouble when pi , sqrt , log , sin , etc. For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every not-representable number is approximated. Some numbers i. There are infinitely many real numbers so many that you can't enumerate them , and there are infinitely many rational numbers it is possible to enumerate them. The floating-point representation is a finite one like anything in a computer so unavoidably many many many numbers are impossible to represent.

In particular, 64 bits only allow you to distinguish among only 18,,,,,, different values which is nothing compared to infinity. With the standard convention, 9. Those that can are of the form m.

You might come up with a different numeration system, 10 based for instance, where 9. Also note that double-precision floating-points numbers are extremely accurate. They can represent any number in a very wide range with as much as 15 exact digits.

For daily life computations, 4 or 5 digits are more than enough. You will never really need those 15, unless you want to count every millisecond of your lifetime.

rupdawola1976's Ownd

0コメント

1000 / 1000