Settings

Theme

Flaky: datatypes for real world quantities

dimaborzov.com

23 points by knowledgesale 13 years ago · 41 comments

Reader

blackhole 13 years ago

This is a terrible idea. This is a catastrophically bad idea. How do you compare numbers? How do you figure out that 2.2 is less than exp(65)? You have to represent these as numbers at some point in the calculation, and in order to do that, you're probably going to be using either floating point or fixed point, which means it's still trivially possible to construct an error case that suffers the exact same problems normal floating point numbers have. Observe:

    x=1
    for(i=0;i<n;++i)
      x=0.01+sqrt(x)
You can't simplify this, so it will simply loop until it hits the datatype boundary, and then get rounded into oblivion, because the underlying floating point representation will break in the exact same way. The only way to get rid of this would be to use an arbitrary number of bits for precision, in which case... just use an arbitrary precision library instead! This is EXACTLY WHY WE HAVE THEM.

Most equations that aren't trivial cannot be simplified. This datatype would only be useful for spoon-fed high school math problems. Furthermore, it costs so much CPU to perform you might as well just use an arbitrary precision library, which will probably be faster and be just as effective at preventing edge cases.

https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

  • ealloc 13 years ago

    At some point you _do_ have to use floating point (the whole point of numeric computations is to do things you can't do symbolically), however, this project might enable judicous use of symbolic manipulation in coordination with floating point. Holding off the computation until further in the program might have advantages: Automatic reordering of computations to help preserve precision, automatic symbolic manipulation to cancel out large terms, and perhaps automatic calculation of the amount of precision needed (eg if calculating "a < b", generate the digits of a and b until they differ). The programmer could specify exactly when symbolic manipulation is desired vs numeric.

    The quadratic equation case he shows is pretty good. Say you want to solve ax^2 + bx + c = 0. In C you might write "(-b + sqrt(b * b - 4 * a * c))/(2 * a)" which has all the precision problems he described. However, in his library that code would be converted to an AST, then symbolically simplified given the variable values, and _then_ evaluted numerically. The intermediate symbolic step would be able to automatically account for the edge cases, eg if c was 0, so that the numeric step wouldn't fail. In this case, the symbolic solver would recognize that (+, (sqrt, (-, ( *, b, b), 0)), b) is exactly zero before the floating point computation occurs.

    I often write scientific programs with the issues he describes, and I have also wished for something along these lines but kind of suspect it is impossible. I usually have to play in mathematica/on paper a bit before finding an algorithm that takes care of the numeric issues (yes, even when using multiprecision), and while it would be nice for it to happen automatically, my impression is that the algorithms I find on paper are nontrivial for a computer to come up with. In some cases, however, the computer could have done it for me. But I have a feeling that in the case precision becomes an issue, even if this library gets me 90% of the way, I would still want total manual control of how the calculations occur. But I still want him to try in case it works!

    • blackhole 13 years ago

      Hey, if he wants to make a symbolic algebra system, that's great! But don't go advertising it as a replacement for floating point, because it isn't.

  • stiff 13 years ago

    Actually, it's a pretty interesting idea. He basically wants to keep track of the last n operations with every given number and take advantage of this history to try to minimize the floating point error. To compare 2.2 with exp(6.5) or anything else you just evaluate the expression, but taking advantage of possible reorderings or simplifications.

    Your critique really boils down to:

    Most equations that aren't trivial cannot be simplified.

    but this is not true, just take a look at what Wolfram Alpha or Mathematica can do.

    • darkmighty 13 years ago

      There are also numbers that can't _possibly_ be represented by this system using the elementary functions/operators, i.e. are "non-analytic".

      This may actually not be a problem in most instances, however. It really depends on the use case, but outside of scientific computing/engineering design we're not usually doing computations to find roots of degree >= 5 polynomials.

      OTOH, as pointed out by GP, it is easy to construct 'unsimplifiable' operations which after some time may grow too large.

      My conclusions are, this certainly cannot be implemented on practical systems without some care -- it's not as straightforward as it looks at first -- and certainly not a better default as is right now.

    • blackhole 13 years ago

      If you have an equation in your program that can be effectively simplified, you should have simplified it before writing in the first place. The thing about equations that can be simplified is that 99.9% of the time, they can be simplified at compile time.

      If you want to write a library to do it for you, great! But don't advertise it as a replacement for floating point number representations, because it isn't.

      • stiff 13 years ago

        Hey, why have compilers, if you can just write hand-optimized assembly yourself!

        Compilers aren't able to do similar optimizations, first of all the flow of data that finally leads to some computation might be non-trivial and I think most compilers deal well only with reasonably local (in space and in time) optimizations. Besides, compilers typically don't know much mathematics. If I have a function for taking a square root and a function for squaring the compiler most likely wont rewrite sqr(sqrt(x)) to just x. Certainly it wont rearrange a quadratic equation to minimize round-off.

        You are basically totally dismissing an idea on very little evidence, which is typical of HN lately. It certainly isn't completely hopeless and trying it out will shed much more light on the problem than your know-it-all-in-advance comments not substantiated by any evidence (like the claim about not being able to simplify most expressions and now the claim of 99.9%).

      • ealloc 13 years ago

        He want more than compile-time simplification. He wants runtime simplification depending on the values encountered at runtime. For example in the quadratic equation.

        And, he clearly isn't marketing it as a replacement for floating point. He says: "It is important to note, however, that the goal of the project is to make tools for symbolic calculations, not to create a viable alternative to the floating point." and he talks a lot about estimating numerical error and precision.

        • stiff 13 years ago

          He says this about SymPy...

          • ealloc 13 years ago

            oops you're right. Nevertheless, he clearly intends to extend rather than replace floating point since one of the goals is to estimate the errors in the floating point calculations, and he rounds.

  • betterunix 13 years ago

    "You can't simplify this"

    I have learned to be careful about making such claims.

    http://mathworld.wolfram.com/NestedRadical.html

    • darkmighty 13 years ago

      Yea but simplifications like this are probably non-computable, even when possible, and hard to do in any case.

    • blackhole 13 years ago

      Wrong, that isn't what this is. This isn't a nested radical to infinite terms, it's a nested radical to n terms. Thus, it cannot be simplified.

eru 13 years ago

> Floating point datatype is the de-facto standard for real world quantities. Whether it is a banking account [...]

Is there any bank which uses floating point for accounting?

  • rwallace 13 years ago

    I encountered an insurance company that used floating point as their standard data type for monetary values. On the bright side, that was in the late nineties; hopefully they've switched to a proper decimal type since then.

  • reeses 13 years ago

    A huge number of ecommerce sites do. (Such as almost every site based on ATG/Oracle Commerce)

    Most banking systems predate standardized FP and use fixed decimal representations.

    • eru 13 years ago

      I can see them being used by the quants for fast pricing. But really, for accounting? Where do the missing cents go?

  • berkut 13 years ago

    I certainly hope not.

shoo 13 years ago

related: there is a short series of exercises in SICP that explore the idea of building an interval arithmetic library. i.e. numerical values are represented by intervals [a, b] which encode their uncertainty/error:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-14.html...

Exercise 2.14 and onwards point out that two expressions that are algebraically equivalent for perfectly accurate values stop being equivalent once we introduce uncertainty. This is only for a toy example expression with 2 variables. Suppose we want to solve a linear system of equations in 100s of thousands of variables. Is it tractable to track the uncertainty for all of those variables at once? Will their uncertainties be dependent? ...

  • ronaldx 13 years ago

    aye, this matches my intuition: having a lower bound and an upper bound (as rational numbers) on each value means you can:

    - go back and improve if necessary, or otherwise - express the uncertainty of a comparison

    thanks for the link :)

  • GhotiFish 13 years ago

    ... that is a really good idea. Why do I keep putting off reading that book?!

riffraff 13 years ago

Possibly of interest: the Frink programming language has arbitrary precision math and unit of measurement tracking, and has been around for more than a decade.

http://futureboy.us/frinkdocs/index.html

darkmighty 13 years ago

Interesting take on floating point problem, but it seems as if writer isn't well versed on the centuries old solution of this problem, namely refinement calculations for lin algebra and more general iterated numerical methods for nonlinear systems -- and those are the places where the precision matters, where you are trying to calculate a figure with a given accuracy.

Note however, that solutions to many problems may in a sense 'non-analytic', there may be no finite set of elementary functions on a given rational number which yields the solution.

Also, iterative answers are usually the only viable way to reach solutions, they're usually much faster than the exact solution (or the floating point precision limited solution), and you can always control how good your solution is.

Observation: So in a sense what is practically used may indeed very close to the Kolmogorov complexity of the solutions - the representation as R=IterativeProblemSolve(Problem,ClosestSolution), where we publish the problem and the desired solution! (assumig we are efficiently describing the problem)

Noughmad 13 years ago

Floating point is not so bad. Yes, when you need fixed precision, such as in accounting, you should avoid it. But with numerical computations, if you run into problem because of float limitations, you're doing it wrong.

The first rule of numerics is not to compare variables of very different magnitude. His first example has coefficients that differ by almost 200 orders. This example is totally outrageous, but still, what any reasonable person would do is introduce some scaling into the equation.

Yes, you have to think about scales, but you already do that. You never write programs with meters and seconds, you choose scaled dimensionless quantities. And unless you choose the scales very badly, you won't get any problems from floats.

weichi 13 years ago

Hard to comment without seeing more concrete examples of his approach. But it makes me wonder whether processors are now so fast that, for a large class of numerical problems, the default should be to give up calculational speed in return for eliminating some of the trickiness involved with floating point.

oofabz 13 years ago

How about continued fractions? Unlike floating point, they can represent any rational number in finite memory. Using generators, you can even represent many irrational numbers, like square roots and pi, in finite memory.

Richard Schroeppel and Bill Gosper explored continued fractions in the '70s:

http://www.inwap.com/pdp10/hbaker/hakmem/cf.html

  • darkmighty 13 years ago

    What's special about representing rationals in finite memory? By definition they can be represented by two finite integers. There are many ways of describing irrational numbers in finite memory if you allow this kind of generalization, e.g. make a datatype to represent equations and numbers as solution to the equations.

smilekzs 13 years ago

What we need is an open-source alternative of Mathematica's core language.

  • reeses 13 years ago

    Common Lisp has a robust and comprehensive numerics facility. Poking around the CLHS and reading about ratios, fixnums, bignums, etc., will lead you to the conclusion that you can get your numeric work done (albeit in many cases much more slowly than you would like).

    Add one of the pattern matching and term rewriting packages and you'll be very close to the foundation of the core of Mathematica. It will be a SMOP and adding to the library of facts enabling recognition of reducible terms.

  • ealloc 13 years ago

    There are a number of these. The one I have been using lately is sympy - a pure python symbolic math library.

    You may also be interested in the open source "sage mathematics", which is a frontend/interface to many different open-source symbolic solvers such as sympy, maxima, GAP, PARI, Singular, and octave.

    They are not as good as mathematica currently, but can already do quite a lot.

  • archgoon 13 years ago
  • GhotiFish 13 years ago

    well, off the top of my head, there's GNU Octave.

jamesaguilar 13 years ago

Maybe it's just me, but saying that something in current use in billions of computers around the globe is . . . somewhat of a stretch of the word "obsolete."

  • eksith 13 years ago

    In this case "obsolete" may be the relevant term as in Automobile coach builders, meet the Buggy Whip makers. The two did co-exist in the early 1900's so the alternative, "outmoded" didn't happen overnight. But it did eventually happen.

    The numbers, in the billions, of course weren't there, but my guess is that the proportions are applicable.

    Edit: After reading the article, I strongly disagree this approach is "better" in any sense of the word. More along the lines of using whale oil for the automobile instead of gasoline. Gasoline isn't perfect, but it works. Until we have something better (I.E. Electric), I'll stick to that.

GhotiFish 13 years ago

floating point will never be obsolete, it is a log scale datatype, and log scale datatypes represent most natural values perfectly.

The only place where the shoe doesn't fit is where you need a minimum accuracy. In that case what you should be doing is using integers to represent your minimum quantifiable unit.

For example, you could represent currency in millicents to give you respectably accurate rounding. Not accurate enough? Microcents then. Now you don't have enough range? Good, you're thinking about your requirements now, floating point DEFINITELY wouldn't of worked.

  • qznc 13 years ago

    In addition you must prevent overflows.

    Microcents with signed long long (64bit) means a maximum value of 263 / 100 / (106) = 92233720368. In words "92 billion", which might not be enough.

    • reeses 13 years ago

      In environments with exhaustive partitions and automatic 'promotion' (i.e., Common Lisp but probably not Ruby, etc.), your overflow would result in a correctly calculated bignum.

    • GhotiFish 13 years ago

      yup! Which is scary to think about, because if you had a microcents requirement for accuracy, your ceiling would of being at 2^53/2*(micro) = 4,503,599,627

      4 billion. eak.

  • reeses 13 years ago

    Integers, BCD, rationals, or other arbitrary precision representations are preferable as well. IBM keeps BCD alive and fast for COBOL and other 'legacy' systems.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection