Settings

Theme

Python rounds float values by converting them to string and then back

github.com

268 points by bishala 6 years ago · 151 comments

Reader

coldtea 6 years ago

Seems to be one of the best ways to go about it.

From the comment in protobuf source (which does the same thing as Python), mentioned in the Twitter thread:

(...) An arguably better strategy would be to use the algorithm described in "How to Print Floating-Point Numbers Accurately" by Steele & White, e.g. as implemented by David M. Gay's dtoa(). It turns out, however, that the following implementation is about as fast as DMG's code. Furthermore, DMG's code locks mutexes, which means it will not scale well on multi-core machines. DMG's code is slightly more accurate (in that it will never use more digits than necessary), but this is probably irrelevant for most users.

Rob Pike and Ken Thompson also have an implementation of dtoa() in third_party/fmt/fltfmt.cc. Their implementation is similar to this one in that it makes guesses and then uses strtod() to check them. (...)

https://github.com/protocolbuffers/protobuf/blob/ed4321d1cb3...

  • SideQuark 6 years ago

    >Seems to be one of the best ways to go about it.

    The C/C++ standards do not require formatting to round correctly or even be portable. I recently had an issue where a developer used this method to round floats for display, and there were differences on PC and on Mac. It literally rounded something like 18.25 to 18.2 on one platform and 18.3 on the other. This led to all sorts of other bugs as some parts of the program used text to transmit data, which ended up in weird states.

    The culprit was this terrible method. If you want anything approaching consistency or predictability, do not use formatting to round floating point numbers. Pick a numerically stable method, which will be much faster of done correctly.

    Coincidentally, C/C++ do not require any of their formatting and parsing routines to round-trip floating point values correctly (except the newly added hex formatted floats which are a direct binary representation, and some newly added function allowing an obscure trick I do not recall at the moment... )

    • eesmith 6 years ago

      > The C/C++ standards do not require formatting to round correctly or even be portable.

      The linked-to method uses PyOS_snprintf(). Its documentation at https://docs.python.org/3/c-api/conversion.html says:

      """PyOS_snprintf() and PyOS_vsnprintf() wrap the Standard C library functions snprintf() and vsnprintf(). Their purpose is to guarantee consistent behavior in corner cases, which the Standard C functions do not."""

      • ChrisLomont 6 years ago

        And those functions in C/C++ are not specified.

        The python wrapper also does not specify in this case, so you should not use them for rounding, or you will have the same problem. No where on that page does it specify proper rounding will be cross platform.

        Simply do it with floats. There are perfectly good, numerically stable, fast rounding methods, that avoid all this nonsense.

      • ImNotTheNSA 6 years ago

        response was to the fact that the comment said the method of format strings was one of “the best ways to go about it”

        It’s obvious that PyOS_snprintf is not a standard library function

        • eesmith 6 years ago

          Sure, PyOS_snprintf isn't a standard library function, but it's a thin wrapper to snprintf, which is. Python/mysnprintf.c is the location of the:

             snprintf() wrappers.  If the platform has vsnprintf, we use it, else we
             emulate it in a half-hearted way.  Even if the platform has it, we wrap
             it because platforms differ in what vsnprintf does in case the buffer
             is too small:
          
          It mentions that one corner cases what happens when the buffer is too small. Not rounding issues.

          The "the best ways to go about it" comment links to the protobuf code, which also uses snprintf.

          The (what I think is the) relevant C99 spec at http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf says:

          > For e, E, f, F, g, and G conversions, if the number of significant decimal digits is at most DECIMAL_DIG, then the result should be correctly rounded. If the number of significant decimal digits is more than DECIMAL_DIG but the source value is exactly representable with DECIMAL_DIG digits, then the result should be an exact representation with trailing zeros. Otherwise, the source value is bounded by two adjacent decimal strings L<U, both having DECIMAL_DIG significant digits; the value of the resultant decimal string D should satisfy L ≤ D ≤ U, with the extra stipulation that the error should have a correct sign for the current rounding direction.

          So either 1) "The C/C++ standards do not require formatting to round correctly or even be portable.", in which case Python and protobuf are doing it wrong and somehow this issue was never detected, or 2) The C/C++ standards do require correct rounding, but the case described by ChrisLomont didn't quite meet the spec requirements to get precision and rounding modes to match across platforms. Or 3), I don't know what I'm talking about.

          • ChrisLomont 6 years ago

            "correctly rounded" is implementation defined is the problem. You cannot do it portably, and you cannot query it portably. As such, different platforms, compilers, etc do it differently. Thus using formatting for rounding is inconsistent.

            Here's [1] where you can query the current floating-point environment in C: "Specifics about this type depend on the library implementation".

            Here's [2] where you can set some rounding modes in C++: "Additional rounding modes may be supported by an implementation.". Note this does not have by default bankers rounding which is used to make many scientific calculations more stable (lowers errors and drift in accumulated calculations). Many platforms do this by default, but it's not in the standard.

            You can chase down this rabbit hole. I (and several others) did during the issue on the last project, and got to where it was well-known in numerics circles that this is not a well-defined process in C/C++. If it were, printing and parsing should round-trip, and it does not before a recent C++ addition, and now it only is guaranteed in a special case.

            [1] http://www.cplusplus.com/reference/cfenv/fenv_t/

            [2] https://en.cppreference.com/w/cpp/numeric/fenv/FE_round

        • mlthoughts2018 6 years ago

          They document their goal is correctness in edge cases that other standard C functions don’t guarantee. Seems obvious to say this is “one of the best” ways to go about it. It’s absolutely true given this stated and documented goal of correctness, which would be a very commonly needed property.

          Other good ways could trade-off edge case comprehensiveness for performance or whatever. That doesn’t make this way less good.

    • okl 6 years ago

      IEEE754 defines 5 rounding modes. This one sounds like nearest, ties to even. Not all decimal values are representable as float. Depending on if you compile for x87 (80-bit internal repreaentation) or SSE (64-bit) you might get slightly different results.

      • ChrisLomont 6 years ago

        I'm well aware of that, having written at length about floating-point tricks, numerical issues, etc.

        The issue here is you don't know what a library that formats a float does, and is the function is not specified clearly (as in C/C++), you have zero way of knowing what you will get.

        Thus I said to do it yourself, using proper numerics.

      • Dylan16807 6 years ago

        They already had the same float, though. That's not very likely if the rounding modes were different.

        For what it's worth, it looks like different standard libraries make different choices on whether float->string conversion cares about the current rounding mode.

    • mixmastamyk 6 years ago

      At some point in 3.x Python moved to "bankers rounding" which is slightly less biased than the one we learn in school, perhaps C++ did the same. Might be a factor in the discrepancy.

  • dheera 6 years ago

    Yep, I'd be curious what any better alternative is.

    Consider that a float's internal representation is in base 2, and you're trying to round in base 10. Even if you didn't use a string, I'd assume you'd have to create an array of ints that contain base 10 digits in order to do the rounding, unless there are some weird math tricks that can be employed that can avoid you having to process all base 2 digits. And an array of ints isn't all that computationally different from a string.

    • masklinn 6 years ago

      In fact, I don't know what cdecimal now does but back when decimal.Decimal was pure python it would store the "decimal number" as a string and manipulate that.

fs111 6 years ago

Apples libc used to shell-out to perl in a function: https://github.com/Apple-FOSS-Mirror/Libc/blob/2ca2ae7464771...

  • stefan_ 6 years ago

    I thought this is what the Unix philosophy is supposed to be all about.

    (Realistically, calling wordexp should just abort the program. Now I actually want to make a hacked up musl that aborts in all the various "libc functions no one should ever use" and see how far I get into a Ubuntu boot..)

    • f00zz 6 years ago

      Would be pretty awesome if Perl called wordexp(3) somewhere along this code path

    • xienze 6 years ago

      > I thought this is what the Unix philosophy is supposed to be all about.

      Perhaps from the perspective of an end user running things from a shell. Generally speaking though, shelling out from within a program is not ideal.

    • ericfrederich 6 years ago

      /* wordexp is also rife with security "challenges", unless you pass it WRDE_NOCMD it must support subshell expansion, and even if you don't beause it has to support so much of the standard shell (all the odd little variable expansion options for example) it is hard to do without a subshell). It is probbably just plan a Bad Idea to call in anything setuid, or executing remotely. */

  • tus88 6 years ago

    That's hilarious. You're not supposed to go the other direction libc!

  • f00zz 6 years ago

        /* XXX this is _not_ designed to be fast */
    • vectorEQ 6 years ago

      it is hard to do without a subshell). It is probbably just plan a Bad Idea to call in anything setuid, or executing remotely. */

      laughing so hard :')

  • benj111 6 years ago

    Which raises the question what libc functions perl calls...

    And imagine the debug errors:

    >perl error X

    "But I'm just calling libc ?!?"

  • gbacon 6 years ago

    Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.

    https://en.wikipedia.org/wiki/Leopold_Kronecker

  • peterwwillis 6 years ago

    What if Perl uses libc ?????

    • saagarjha 6 years ago

      It does on my system:

        $ otool -L /usr/bin/perl
        /usr/bin/perl:
            /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1663.0.0)
            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
    • masklinn 6 years ago

      That's not an issue unless perl calls wordexp (as part of the stuff it does when called by wordexp).

    • ImNotTheNSA 6 years ago

      then it calls the Perl implementation... is there some nuance of perl that would cause a problem with that?

      • zerocrates 6 years ago

        They're jokingly positing a loop: libc shells to perl, perl implements the function via libc call, which shells to perl... and so on.

      • lathiat 6 years ago

        So if Perl uses libc, which shells put to perl, which uses libc... get it? :)

  • imglorp 6 years ago

    O_o

Noe2097 6 years ago

Well, the problem is precisely that rounding as it is generally conceived, is expressed in base 10 - as we generally conceive numbers including floating point ones in base 10. Yet at the lowest level, the representation of numbers is in base 2, including floating point ones. It is imaginable, would be more correct and efficient to perform rounding (or flooring or ceiling, for that matter) in base 2, but it would be that more difficult to comprehend when dealing with non integers in code. Rounding in base 10 needs some form of conversion anyway, going for the string is one way that is, at least, readable (pun intended).

bhouston 6 years ago

In my experience there are few things slower that float to string and string to float. And it seems so unnecessary.

I always implemented round to a specific digit based on the built-in roundss/roundsd functions which are native x86-64 assembler instructions (i.e. https://www.felixcloutier.com/x86/roundsd).

I do not understand why this would not be preferable to the string method.

float round( float x, int digits, int base) { float factor = pow( base, digits ); return roundss( x * factor ) / factor; }

I guess this has the effect of not working for numbers near the edge of it's range.

One could check this and fall back to the string method. Or alternatively use higher precision doubles internally:

float round( float x, int digits, int base ) { double factor = pow( base, digits ); return (float)( roundsd( x * factor ) / factor ); }

But then what do you do if you have a double rounded and want to maintain all precision? I think there is likely some way to do that by somehow unpacking the double into a manual mantissa and exponent each of which are doubles and doing this manually - or maybe using some type of float128 library (https://www.boost.org/doc/libs/1_63_0/libs/multiprecision/do...)...

But changing this implementation now could cause slight differences and if someone was rounding then hashing this type of changes could be horrible if not behind some type of opt-in.

  • StephanTLavavej 6 years ago

    float to string is incredibly fast now - look at Ulf Adams’ Ryu and Ryu Printf algorithms, which I’ve used to implement C++17 <charconv> in Visual Studio 2019 (16.2 has everything implemented except general precision; the upcoming 16.4 release adds that).

    I don’t know of truly fast algorithms for string to float, although I improved upon our CRT’s performance by 40%.

    • ChrisLomont 6 years ago

      Formatting is much faster than before, but still terribly slow compared to simply rounding numbers using math and floor and ceiling appropriately.

      Ryu is more than 100x slower than something like

      rval = floor(100*val+0.5)/100.0

      (which is not quite right due to numerical issues, but close, and illustrates the idea).

      Formatting, to get a rounded float, is terribly slow.

  • Vindicis 6 years ago

    I don't think converting is slow by itself depending on what you need done. I have a function in my code I wrote to convert strings of floats/doubles to a rounded string of however many digits you want(for storing financial data that was muxed with multiple streams), and converting 1.59688139452f to a string, and then rounding that string to the 5th decimal place took 8.759 seconds for 10 million iterations (87.5 nanoseconds/iteration). Granted, it was written for my specific use case, so I don't need to handle the various edge cases/formats. But, little is inherently slow if you take the time to write a solution for your own need.

  • messe 6 years ago

    Using native x86_64 instructions isn't portable.

    • jharger 6 years ago

      Why not have optimized versions that use native instructions when available, and then fall back to the portable version when they are not?

      • messe 6 years ago

        I'm unsure as to why they don't do that, I suspect it's because nobody using Python has found floating-point rounding to be a bottleneck yet.

        • DoctorOetker 6 years ago

          I have encountered situations where irregular rounding became solvable but annoyingly problematic to detect / calculate, in the LANL Earthquake dataset on Kaggle, it had a column with samples and a column with (incorrectly incrementing) sample times that were rounded. In order to create a corrected column, I noticed quite a lot of irregularities in the python rounding (or the underlying mechanism)

          I also consider simultaneously deterministic, fast, portable rounding or binary formats to be important for decentralized delegation of computation, say a TrueBit style platform:

          https://people.cs.uchicago.edu/~teutsch/papers/truebit.pdf

        • SideQuark 6 years ago

          It's because Python is slow everywhere, so it's hard to find bottlenecks like this. This method is easily 100x slower than need be. If everything else were well written, this would stand out. Since lots of Python is written like this, none stand out.

          Death by 1000 papercuts....

        • joshuamorton 6 years ago

          More specifically, cpython generally biases toward simpler implementations. I'd be surprised if there's any custom x86 assembly anywhere in cpython.

      • slx26 6 years ago

        I suspect different behaviors might be an issue in some tricky cases.

bishalaOP 6 years ago

Related thread on Twitter https://twitter.com/whitequark/status/1164395585056604160

shellac 6 years ago

OpenJDK BigDecimal::doubleValue() goes via a string in certain situations https://github.com/openjdk/jdk/blob/master/src/java.base/sha...

  • SloopJon 6 years ago

    I just ran into a similar booby trap the other day: whereas BigDecimal::BigDecimal(double) does the full decimal expansion, BigDecimal::valueOf(double) goes through Double::toString(double), which is generally a lot fewer digits.

latchkey 6 years ago

https://0.30000000000000004.com/

  • moring 6 years ago

    Somewhat offtopic, but is there a reason some many explanations of this issue lump together the fundamental principle of how numbers are represented (integers vs. fractions vs. exact reals (technically impossible) vs. IEEE 754) and the base (decimal vs. binary)? Every time I read something like the explanation on that site, I wonder if I would understand it if I didn't knew it already.

    • mumblemumble 6 years ago

      The "why does this happen" on that page hits on it, at least tangentially. A lot of what's confusing about IEEE floats isn't the inability to represent all rationals in and of itself, it's more that the particular patterns of inaccuracy end up being different between the computer approximation and the approximations we'd make on paper, because of the different numeric bases.

      • ben509 6 years ago

        Related to that problem, a major source of confusion with IEEE floats is that languages go to great lengths[1] to present them as decimals and hide the underlying binary denominator. Even when you know they're binary internally, it throws you off.

        High level languages are also annoying in that they don't provide great support for working with them as binary denominated rationals, e.g. there's no round_base2 in python, and there's no hex float representation in printf.

        [1]: https://news.ycombinator.com/item?id=10915182

        • mark-r 6 years ago

          Python doesn't support hex float in printf, but it does give you an easy way to do the conversion: x.hex()

    • mark-r 6 years ago

      The difference between decimal and binary is essential to understanding the problem. Just as there's no elegant way to represent 1/3 in base 10, there's no elegant way to represent 1/10 in base 2.

  • ilovepeppapig 6 years ago

    The worst way to explain something is to begin with "It's actually pretty simple."

    • The_rationalist 6 years ago

      Not always! E.g Let's say someone begin to explain me a topic and use e.g two words that I didn't knew before: e.g supervenience and congency.

      Humans generally are afraid of new words (especially weird sounding ones) and often will assume that the subject is complex and might intimidate them.

      But unknown words can have extremely simple meanings, or be synonyms of already known words.

      By asserting: "It's actually pretty simple" You give them confidence that there's not reason to be afraid of the topic or of the words.

      • Zancarius 6 years ago

        There's much truth in this.

        Whenever I've helped older people with technology they've never used before (a new tablet or similar), if I started off with any suggestion that it's less than simple, they'll almost certainly frame the problem scope in their mind as difficult, and give up, because they're already exhibiting some animosity toward learning a new thing.

        If instead you phrase it as "this is really easy, let me show you how...", you short-circuit this process by framing their expectations differently, and that little bit of extra confidence ("this is easy") can help them through the learning process.

        I've found you can't simply show them, either. It's almost better to say "It's easy" and then go through the process, because it's absolutely necessary to establish expectations first. They're already afraid of it (it's new), so doing something to get their guard down can go a long way toward helping them explore on their own. I tried this experiment with my mother, and some weeks later she'd have a problem and discover the solution herself specifically because she was convinced it was easy to do. This can (and will) backfire if you're not careful about how you do it, but I've had far more success using this tool than other techniques individually (e.g. writing down instructions).

        This doesn't broadly apply to areas outside education and support (or even to all areas in education), but for simple things that people may express an irrational fear over, it works and it works well. A good teacher will use this technique successfully with their students, so if you're teaching someone, use it!

    • furyofantares 6 years ago

      I tend to agree.

      If someone has some anxiety about not understanding something, telling them it's actually pretty simple can just reinforce the framing they already have going in that maybe they're too dumb to get it.

      I've found it's usually better to acknowledge that it's a little difficult or otherwise totally normal not to already know / have grasped the thing in question.

      I think the intent is in the right place when saying "it's actually pretty simple" -- you want to provide optimism. The approach I like is along the lines of "this part is a little tricky, so let's break it down."

    • olooney 6 years ago

      I punched "It's actually pretty simple" into http://talktotransformer.com (which generates nonsense text from a seed using an OpenAI language model) and after a couple of tries it gave me this:

      > It's actually pretty simple. We'll be looking at something called the "D-Wave P-500", which is a version of the P500 chip for quantum computers.

      > It's basically a single bit computer, but with more than 500 qubits. Which means that our "real number" will have more numbers than the number of qubits that are available. That's really important.

      > Quantum computers are theoretically able to do more things than just solve equations. For example, the way that a quantum computer uses energy from an electron to solve a classical math problem, or the way that it can break a complex calculation into smaller bits of information that each can solve on its own, is very different from how computers currently work.

      > But I am not suggesting that a quantum computer can be used to solve more abstract problems. Because that would be crazy.

      > But to give an example of what it could do, imagine doing a number crunching function that was 10× faster than a classical chip, and that had some really useful, and practical things that would be interesting to try.

      Because Talk to Transformer is trained on real-world data, this supports the hypothesis that the phrase "It's actually pretty simple" is often followed by an unintelligible and highly technical explanation.

  • TekMol 6 years ago

        Computers can only natively store integers,
        so they need some way of representing decimal numbers.
    
    Really? How do computers "natively" store integers?
    • mumblemumble 6 years ago

      The intent of that sentence is a lot clearer if you also consider the subsequent text, which expands on the idea quite a bit:

      > Computers can only natively store integers, so they need some way of representing decimal numbers. This representation comes with some degree of inaccuracy. . . Why does this happen? It's actually pretty simple. When you have a base 10 system (like ours), it can only express fractions that use a prime factor of the base. . .

      Perhaps a more formally correct way to put it is that integers (and natural numbers) are the only numbers that a computer can manage in a way that behaves reasonably similarly to the corresponding mathematic set. Specifically, as long as you stick to their numeric range, computer ints behave like a group, just like real integers do. But IEEE floats break the definition of a field in every which way, so they're really not a great match for the rationals.

      That said, you could represent the rationals as a pair of integers, and that would be better-behaved, and some programming languages do do that. But I'm not aware of an ISA that supports it directly.

    • tyingq 6 years ago

      It seems to just be layman shorthand for storing (not huge) integers as binary isn't lossy.

      • Dylan16807 6 years ago

        Storing integers as decimal (which computers can do easily) isn't lossy either.

        • viraptor 6 years ago

          I believe that's where "natively" comes in. If you store it in binary, you can operate on the values as numbers. If you do some decimal conversion, it's a blob of data. (Yes, there are BCD instructions but they're massively limited)

        • SmellyGeekBoy 6 years ago

          Decimal numbers are still "stored as binary" at the silicon level.

        • tyingq 6 years ago

          I meant as opposed to floats, precision loss, etc.

    • bregma 6 years ago

      Maybe you need to parse "decimal numbers" as "fractional numbers" rather than "expressed using a radix of 10".

      Perhaps it would be better phrased as "computers can only store whole numbers, so they need some way to represent other information, including integers, rational numbers and approximations to complex numbers."

  • aitchnyu 6 years ago

    Whoa. In FF, I just see a screenfull of blank boxes. I scroll down and see content wrongly rendered. On Chrome its a dated design with an uncomfortable text size, and the narrow column creates boxes where I have to use horizontal scrollbars. The Motherfucking Website revolution cant come too soon enough!

zelly 6 years ago

This is what we are promised will make trucks drive themselves and usher in the 4th industrial revolution.

analog31 6 years ago

My quick impression is that the choice of a rounding algorithm is relative to the purpose that it serves. For instance, floor(x + 0.5) is good enough in many applications.

In some cases, rounding is performed for the primary purpose of displaying a number as a string, in which case it can't be any less complicated than the string conversion function itself.

  • raphlinus 6 years ago

    Fun fact: floor(x + 0.5) rounds 0.49999997 to 1.0 (this is 32 bit floats, the same principle applies to 64). Most libraries have slower than ideal round conversion because of historical dross; modern chips have a very fast SIMD round instruction but its behavior doesn't exactly match libc round. See https://github.com/rust-lang/rust/issues/55107 for a deeper discussion.

    • dwheeler 6 years ago

      I just tried this on Python3 on a 64-bit x86 system:

          import math
          x = 0.49999999999999994
          print(x-0.5)
          print(math.floor(x+0.5))
      
      I got these printouts:

          -5.551115123125783e-17
          1
      
      So yes, something less than 1/2, with 1/2 added to it, has a floor of 1 in floating point math.

      Yet another reminder that floating point calculations are approximations, and not exact.

      • e12e 6 years ago

        In Julia it's easy to see the difference between promoting to full number tower and sticking to 64 floats (ed: note difference in result 0 vs 1):

          julia> using Benchmarktools
        
          julia> @btime floor(0.49999999999999997+0.5)
        
        0.027 ns (0 allocations: 0 bytes)

        1.0

          julia> @btime floor(0.49999999999999997+BigFloat(0.5))
        
        280.594 ns (6 allocations: 336 bytes)

        0.0

          julia> @code_native floor(0.49999999999999997+BigFloat(0.5))
            .text
          ; ┌ @ floatfuncs.jl:152 within `floor'
            subq    $24, %rsp
            movq    %rsi, 16(%rsp)
            movq    (%rsi), %rax
          ; │┌ @ floatfuncs.jl:152 within `#floor#543'
            movq    %rax, (%rsp)
            movabsq    $jl_system_image_data, %rax
            movq    %rax, 8(%rsp)
            movabsq    $japi1_round_16005, %rax
            movabsq    $jl_system_image_data, %rdi
            movq    %rsp, %rsi
            movl    $2, %edx
            callq    *%rax
          ; │└
            addq    $24, %rsp
            retq
            nopw    %cs:(%rax,%rax)
          ; └
        
          julia> @code_native floor(0.49999999999999997+0.5)
            .text
          ; ┌ @ floatfuncs.jl:152 within `floor'
          ; │┌ @ floatfuncs.jl:152 within `#floor#543'
          ; ││┌ @ floatfuncs.jl:152 within `round'
            vroundsd    $9, %xmm0, %xmm0, %xmm0
          ; │└└
            retq
            nopw    (%rax,%rax)
          ; └
      • masklinn 6 years ago

        > Yet another reminder that floating point calculations are approximations, and not exact.

        floating-point calculations are absolutely exact. What's not is conversions between decimals and floating-point.

        • edflsafoiewq 6 years ago

          Their not being exact is why 0.49999997 + 0.5 = 1. There isn't enough precision in the range just below 1.0 to represent the number 0.99999997.

        • murkle 6 years ago

          > floating-point calculations are absolutely exact

          Are you sure about that? What about 1.00000000000001^2 (using eg 64 bit double)?

          • lifthrasiir 6 years ago

            Maybe the parent wanted to say that the result is detetministic and not far too off from the true value (in fact, as accurate as possible given the format). In the other words FP is exact but abides by a slight different calculation rule involving rounding.

            • dwheeler 6 years ago

              Floating point is usually deterministic, but from a math point of view it is inexact (in most cases it only approximates the exact answer). In many cases that is fine, which is why it is used, but it is important to remember that.

    • analog31 6 years ago

      Good call. I usually only use this conversion when the input is approximate to begin with, e.g., feeding a floating point "signal" to a DAC, or computing the contents of a lookup table to be coded into an arduino. To put it another way, I can live with some uncertainty as to the precise threshold going from one output value to the next.

    • jacobolus 6 years ago

      If the number is positive you can substitute floor(x - 0.5) + 1

      Or just explicitly check for 0.5 - 1ulp as a special corner case.

jancsika 6 years ago

A bit on topic...

Is there a phrase for the ratio between the frequency of an apparent archetype of a bug/feature and the real-world occurrences of said bug/feature? If not then perhaps the "Fudderson-Hypeman ratio" in honor of its namesakes.

For example, I'm sure every C programmer on here has their favored way to quickly demo what bugs may come from C's null-delimited strings. But even though C programmers are quick to cite that deficiency, I'd bet there's a greater occurrence of C string bugs in the wild. Thus we get a relatively low Fudderson-Hypeman ratio.

On the other hand: "0.1 + 0.2 != 0.3"? I'm just thinking back through the mailing list and issue tracker for a realtime DSP environment that uses single-precision floats exclusively as the numeric data type. My first approximation is that there are significantly more didactic quotes of that example than reports of problems due to the class of bugs that archetype represents.

Does anyone have some real-world data to trump my rank speculation? (Keep in mind that simply replying with more didactic examples will raise the Fudderson-Hypeman ratio.)

  • pvg 6 years ago

    What's the 'class' of the second thing? Numerics/fp bugs of all stripes are super common. Just often less crashy or noticeable.

d--b 6 years ago

Note that there is a fallback version that doesn't use strings. This is definitely something that's been thought through.

ericfrederich 6 years ago

I was looking once at Python and Redis and how numbers get stored. I remember Python would in the end send Redis some strings. I dove pretty deep and found that Python floats when turned into a string and then back are exactly the same float.

I remember even writing a program that tested every possible floating point number (must have only been 32 bit). I think I used ctypes and interpreted every binary combination of 32 bits as a float, turned it into a string, then back and checked equality. A lot of them were NaN.

  • mark-r 6 years ago

    Checking equality on NaN doesn't work the way it does for other numbers. Your test might have had a fatal flaw.

  • dagw 6 years ago

    A lot of them were NaN.

    I seem to recall that ~0.5% of the IEEE 32 bit float space is NaN.

    • ekimekim 6 years ago

      A NaN is any value where the 7-bit (for 32-bit floats) exponent is all 1s, except for +/-inf. So a quick approximation is that 1/128 ~= 0.78% of the space is NaN.

      That means there's 25 bits that we can change while still being either a NaN or an inf. But two of those values are infs, so we need to remove them. Divide that by the entire range and we have (2^25 - 2) / 2^32 = 16777215/2147483648, or about 0.78124995%.

deckar01 6 years ago

`blob/master` isn't a suitable permalink. Use the first few letters of the commit hash so the line numbers and code are still relevant when this file inevitably gets modified.

ChrisSD 6 years ago

Maybe I'm missing something but what's wrong with rounding floats this way?

  • ben509 6 years ago

    Rounding a number is, in the common case, multiplying it by some base, truncating to an integer, and dividing by the base. You do have to handle extremely high exponents, but even the logic for that is not complex.

    Example of implementing it the sane way: https://github.com/numpy/numpy/blob/75ea05fc0af60c685e6c071d...

    Every step of this function is complex and expensive, especially printing a float as a decimal is very complex. And round is routinely used in a tight loop.

    • dagw 6 years ago

      The numpy approach sacrifices correctness for speed (you sometimes get unexpected results in some corner cases, see below), the cpython way sacrifices speed for correctness.

        >>> round(56294995342131.5, 2)
        56294995342131.5
        >>> round(56294995342131.5, 3)
        56294995342131.5
      
        >>> np.round(56294995342131.5, 2)
        56294995342131.5
        >>> np.round(56294995342131.5, 3)
        56294995342131.51
    • phkahler 6 years ago

      The problem is the division after truncation. That division by a power of 10 can produce errors in binary.

    • Armisael16 6 years ago

      How does truncating a positive number ever round up?

      • NohatCoder 6 years ago

        Add 0.5 before rounding towards negative infinity, and you'll get standard rounding.

  • yjftsjthsd-h 6 years ago

    I don't know that it is "wrong", just unexpected. I suspect most people expect all math functions to be purely implemented in numerical terms, so finding string manipulation is surprising/interesting.

    • CogitoCogito 6 years ago

      > I don't know that it is "wrong", just unexpected. I suspect most people expect all math functions to be purely implemented in numerical terms, so finding string manipulation is surprising/interesting.

      You kind of got me thinking now. The decimal representation of a number is really a string representation (in the sense of a certain sequence of characters). Hence rounding to a certain decimal is essentially a string operation. You can of course do it by (say) dividing by 10^whatever or something else in some numerical fashion, but the more I think about it, the more natural it is to just think of the whole thing as a string.

      Or you could flip it around and consider that the string manipulation can also be described numerically so whether you consider the operation as a string operation or a numerical operation is sort irrelevant. It's just a point of view.

      • Frost1x 6 years ago

        I think the best way to think about it is as a symbolic representation. We have processes for manipulating symbols to achieve the correct results. Purely numeric (binary based) operations just happen to allow for some quicker shortcuts but sometimes lead go lost information.

      • kthejoker2 6 years ago

        This is one of the core ground/figure themes in Godel Escher Bach.

      • xamuel 6 years ago

        >The decimal representation of a number is really a string representation

        It is incorrect to speak of "the" decimal representation of a number, as many numbers have non-unique decimal representations, the most famous example being 1.000...=0.999...

        • bloomer 6 years ago

          The definition that makes it unique is the shortest representation where trailing zeros are bout included. In your example that would be 1 note that this definition comes up in binary floating point where there are infinite decimal representations that will round back to a given binary64 float but the decimal representation chosen is the shortest (and closest to the binary64 float in the case of ties for length).

        • CogitoCogito 6 years ago

          > It is incorrect to speak of "the" decimal representation of a number, as many numbers have non-unique decimal representations, the most famous example being 1.000...=0.999...

          That's true and I'll concede that point, but it's not really relevant to what I said. That just means some numbers have different string representations that represent the same object. That doesn't really contradict anything in my post except the use of the definite article.

    • mark-r 6 years ago

      There are many corner cases involved with rounding, and the folks who did the string conversion had to put a lot of effort into handling all of them. It makes sense to piggyback on their efforts, even if it isn't the most efficient way 99% of the time.

    • kzrdude 6 years ago

      And then strings are text implemented in numerical terms

  • emsy 6 years ago

    Python already doesn't have the best performance. If you need to round a lot of floats in a loop you better bring some time.

    • pletnes 6 years ago

      Or use numpy, like the rest of us.

    • sqrt17 6 years ago

      round() is specifically about rounding to decimal places, and there are other, faster functions for the more common cases

  • Aardwolf 6 years ago

    They're using base 10 which is much slower to use here than power of 2 bases would be. Plus then also the memory management for the string.

    • viraptor 6 years ago

      The memory management will virtually never kick in. You'd need a number which expands to more than 100 characters for that to happen.

  • bottled_poe 6 years ago

    The two concerns I have are performance and correctness. I don’t know enough about the implementation of round(3) to know... perhaps someone else does?

    • dagw 6 years ago

      This approach is used specifically because of correctness. Doing things the 'obvious' way with round(3) or truncation introduces precision problems in corner cases.

  • f00zz 6 years ago

    That's what round(3) is for

    • masklinn 6 years ago

      round(3) can only round to an integer. Python's round works to an arbitrary decimal position.

      It would previously scale up, round (ceil/floor really) then scale down. That turned out to induce severe precision issues: https://bugs.python.org/issue1869

      • f00zz 6 years ago

        That's the first thing I'd try, multiply by 10^digits, round, divide by 10^digits. Thanks for the link.

dahart 6 years ago

Not entirely unlike how one of the better ways to deep-copy a JSON object in Javascript is json.parse(json.stringify(obj))

kstenerud 6 years ago

This is where decimal floating point really shines. Since the exponential portion is base 10, it's trivially easy to round the mantissa.

The only silly part of ieee754 2008 is the fact that they specified two representations (DPD, championed by IBM, and BID, championed by Intel) with no way to tell them apart.

science404 6 years ago

Misleading title is misleading...

CPython rounds float values by converting them to string and then back

Jenz 6 years ago

I dunno, how efficient is this?

  • coldtea 6 years ago

    So efficient that nobody really cared about or mentioned it all those decades, so there's that...

  • ThePadawan 6 years ago

    I agree - my takeaway is "float-to-string and string-to-float conversion is probably faster than I thought"

  • ben509 6 years ago

    Compared to base=10^places, multiply, truncate, divide? Horriby inefficient.

    • dagw 6 years ago

      Sure, but their approach is on the other hand more correct. All numerical code reaches a point where you have to balance performance vs. correctness, and here cpython has chosen correctness over speed.

      • ben509 6 years ago

        They're creating a sequence of digits and then truncating. If you want to replicate that precisely, you could use an accumulator and a loop to do the same thing. At least then you could break early.

    • ben509 6 years ago

      It's slower than native python.

          %timeit round(12335423552.33, -6)
          %timeit int(12335423552.33 / 1_000_000.0) * 1_000_000.0
          500 ns ± 3.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
          219 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
      • dagw 6 years ago

        Well obviously the approach round() takes is slower than the naive approach. The whole point of doing it the way round() does it is that it gives the correct answer in cases where the naive approach fails.

acoye 6 years ago

Another pragmatic aspect of Python as I see it.

seamyb88 6 years ago

Am I the only one grimacing at the lack of curlies around if/else scope? Just good practice!

  • bhd_movie 6 years ago

    In case you didn't know, Python doesn't use curly braces to block scope. They're pretty much only used for dictionaries and sets.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection