Towards a new SymPy

oscarbenjamin.github.io

141 points by asmeurer 3 years ago · 71 comments

Reader

Part II made the front page yesterday: https://news.ycombinator.com/item?id=37426080

A comment there makes what I think is a very good point about "the lack of consolidation of computer algebra efforts": https://news.ycombinator.com/item?id=37430437

I don't know what might drive or foster such consolidation. Maybe Category Theory? Bridging syntax?

viscousviolin 3 years ago

How about Lean? [0] There's a whole library of mathematics written down in Lean called Mathlib, which spans most of the undergraduate maths curriculum upto some cutting edge research-level maths. I've commented under the Part II post you linked to as well, describing how I think Mathlib could help the CAS ecosystem.
[0] https://leanprover-community.github.io/
- staunton 3 years ago
  
  That's an entirely different thing though. Good luck getting Lean to help you do any symbolic computation whatever. You can use it to prove that a given manipulation is correct. You cannot use it to find a result (there may be a symbolic math library for Lean eventually but currently there isn't).
  - viscousviolin 3 years ago
    
    That's not my point. In the comment I made below the other post I talked about how Mathlib is just a piece of the puzzle, the way I see it [0]. I don't want to see Lean do symbolic computation - although I'm sure that would be great as well - what I want to see is to have Mathlib's library of mathematics being used in a way that any CAS can read definitions and theorems from it. Then a significant part of the CAS codebase won't have to be manually written, and developers can spend time on things that are not re-inventing some wheel.
    [0] https://news.ycombinator.com/item?id=37435449
    
    staunton 3 years ago
    
    Fair enough, that would be nice. However, it sounds like a very hard thing to build in practice. I'd expect, actually, it's best done in Lean itself. So you're proposing a CAS written in Lean, which noone is currently working on.
  - logicchains 3 years ago
    
    In Coq, the tactics system supports writing solvers than can both find a result and give you the proof that it's correct. If Lean also has a similar tactics system, then solvers could be written in that.
    
    staunton 3 years ago
    
    Lean does have a tactics system (very similar to Coq). However, there is no general tactic that will do, e.g., "sin^2 + cos^2 = 1" for you. (Writing a tactic specifically for that simplification is easy, of course, but that's not what a CAS is). (I doubt there is one in Coq that will do this, although I'm not sure). A symbolic algebra system is expected to do this simplification (and much more advanced ones) automatically. All in all, proof assistants aren't computer algebra systems. You could make computer algebra systems in those frameworks but I'm not aware of any.
    
    zozbot234 3 years ago
    
    Typically, there are 'group', 'ring' and 'field' tactics that will perform simplification in the relevant algebraic structures. Going from that to solving sin^2 x + cos^2 x is simply a matter of writing more such tactics, operating on e.g. real fields or whatever is applicable in any given case.
    This is a lot less ad-hoc than whatever CAS's do, but it's also less error prone.
    
    westurner 3 years ago
    
    Do any existing CAS systems have configurable axioms? OTOH: Conway's surreal infinities, Do not early eliminate terms next to infinity, Each instance of infinity might should have a unique identity, configurable Order of operations,
    All of the axiomatic transformations applied by a CAS like SymPy should/must be in Lean Mathlib somewhere? If nothing else, a lookup_lean_mathlib_definition(expr, 'path/to/mathlib-v0.0.2') or find_similar(expr, AxiomDB) would be useful.
    How do CAS differ from Production Rule Systems? https://en.wikipedia.org/wiki/Production_system_(computer_sc...
    CAS > Simplification: https://en.wikipedia.org/wiki/Computer_algebra#Simplificatio...
    Rewriting: https://en.wikipedia.org/wiki/Rewriting
    Because the rulesets are expected to change, rules engines have functionality to compile rules into a tree or better for performance.
    eBPF is not a rules engine, but it does optimize filter sets IIRC?
    
    staunton 3 years ago
    
    Julia's Symbolics.jl allows users to easily add custom rewrite rules. However, it's not a state-of-the-art CAS. I don't have that much experience with others. Mathematica doesn't allow it, I think.
    
    westurner 3 years ago
    
    FWIU Wolfram's searching for a unified model with the Wolfram Physics Project, too; e.g. "The Physicalization of Metamathematics and Its Implications for the Foundations of Mathematics" (2022) https://www.wolframscience.com/metamathematics/ https://www.wolframphysics.org/bulletins/
    Are fundamental constants other-valued in any Many Worlds interpretations, or are e, i, and Pi always e, i, and pi with the same relations?
    Countability and continuua (in a Hilbert space of degree n, where n is or is not inconstant like the many forms of [quantum discord] entropy and the energy that represents them)
    TIL the separable states problem is considered NP-hard, and many models specify independence of observation as necessary.
    
    staunton 3 years ago
    
    I don't see how that relates to the question of "does Mathematica allow users to define simplification rules?"
    
    westurner 2 years ago
    
    https://www.ma.imperial.ac.uk/~buzzard/xena/natural_number_g... :
    > In this game, you get own version of the natural numbers, called `mynat`, in an interactive theorem prover called Lean. Your version of the natural numbers satisfies something called the principle of mathematical induction, and a couple of other things too (Peano's axioms).
    Such axioms are hard-coded in SymPy, SageMath, and Mathematica but not in Lean Mathlib (which is not optimized for performance)
    If there are an infinite number of unitary transformations (as rotations on a Bloch sphere for example), I find it unlikely that we've yet discovered all of the requisite operators and axioms and coded them into any human CAS that exists in present day.
- bmitc 3 years ago
  
  Mathlib is not nearly as complete as advertised. It is very much a collection of research projects with little cohesion.
  - staunton 3 years ago
    
    A CAS also covers only a tiny part of "an undergrad math degree" so that's beside the point here. Assuming it had appropriate tactics doing the job of a CAS using all theorems (which isn't happening any time soon), what math would be missing?

mathisfun123 3 years ago

The unfortunate thing is that core CAS functionality (and performance) should be based on a SAT/SMT engine to discover the rewrite rules and it goes without saying (but I'll say it anyway) implementing that is "hard". Naively (i.e., I haven't really thought through it), I think sympy should be built on top of z3 or cvc5 or at least very tighly integrated (e.g., support as a backend). And (again naively) I think until such time, sympy will remain a toy.

I will say though that symengine is a great project and congrats to that guy for pulling it off under the constraints of a phd.

StableAlkyne 3 years ago

Not an expert in SAT solvers, personally. What would the benefit be in using one?
Anaconda's absurdly slow dependency resolver (it takes up to 15 minutes for things Pip installs in seconds, with no way of disabling it outside of installing a third party solver) is based on SAT and left a bad taste in my mouth for them (slow, clunky, etc), but maybe it's just a poor implementation on their part.
- mathisfun123 3 years ago
  
  > Anaconda's absurdly slow dependency resolver
  On the one hand this is like comparing apples and oranges. On the other hand SAT is slow for dependency management not because the solver itself is slow (it might well be - I don't know which solver they packaged) but because it's an NP-complete problem.
  Back to sympy: a computer algebra system (CAS) is primarily (IMHO) an algebra system, not a matrix manipulation library or a pde solver or whatever kitchen sink collection of things is in all of them. Algebra in this context means manipulating algebraic expressions and that's term rewriting and that's also NP-complete (well at least in some form or fashion, eg egraph extraction).
  So in summary - there's no way out of using SAT/SMT here.
  > slow, clunky, etc
  Just to put a finer point on this - it's a very shallow thing to look at conda's or whomever's implementation and then paint over SAT/SMT with that same brush. The way that I usually describe z3 is that it is nuclear weapons grade industrial software. I mean jesus christ its stated goal is solving NP complete/hard problems and it frequently succeeds at this goal on problems with millions of decision variables and clauses. It is absolutely the highest tech piece of tech out there (XYZ pytorch/tensorflow ai ml thing pales in comparison) and we are all extremely lucky that it is licensed permissively and developed completely in the open (and basically by one guy!). And it is being used in many many places for very serious engineering.
- scoresmoke 3 years ago
  
  Yes, this is a big disadvantage. But have you tried Mamba that aims at implementing Anaconda more efficiently? It works really well in most cases.
  https://mamba.readthedocs.io/
  - StableAlkyne 3 years ago
    
    Mamba is great, and I use it when possible. It's probably the single best thing to happen to Anaconda in years. Nowadays it's installed automatically, but up until a month ago you had to deal with the slow solver to install it.
    And fortunately it will become the default by Q4!
    The point I was making is the old solver left an impression on me that SAT is slow and inefficient, that's all really :)
  - bouchard 3 years ago
    
    You can use mamba's much faster solver directly with conda.
    conda install -n base conda-libmamba-solver conda config --set solver libmamba
- bonzini 3 years ago
  
  Fedora's dnf also uses SAT and it's way faster (with caching) than the handwritten resolver it replaced.
- abdullahkhalids 3 years ago
  
  Miniconda has a better dependency resolver, and it can resolve in less than a minute in most cases.
  - StableAlkyne 3 years ago
    
    Miniconda is faster (I prefer it, personally)...
    ...but Miniconda and Anaconda are the same package manager, the difference is that Anaconda comes batteries-inclused with a ton of common packages.
    The reason Miniconda is faster isn't the solver, it's because it doesn't have 250 packages to go through the long checking process on.
    If you have a large, complex environment in Miniconda, that environment will get slower and slower to resolve over time.
    
    taeric 3 years ago
    
    250 is not a big number... :(
  - VHRanger 3 years ago
    
    They have the same resolver (Conda)
    You just have fewer packages installed with miniconda, so the sat problem is much smaller
philzook 3 years ago

You might enjoy ruler https://github.com/uwplse/ruler
It would be very interesting for SMT and CAS to converge a bit more. SMT in expressiveness and domains and CAS in rigor.
The modality of their usage is different. CAS tends to return some expressions of interest, which it is hard to get SMT to do. Either you get "unsat" or a particular model from an SMT solver, not a simplified expression (ok, z3 has a simplify command, which is pretty cool).
SMT today is not obviously expressive enough to handle most of the domains and questions that come up in CAS systems.
Most SMT solvers do not intrinsically handle transcendental functions or any notions of calculus, abstract algebra, etc.
CAS systems are largely interested in problems of equational reasoning, whereas SMT's bread and butter is gluing together "trivialities" like linear inequalities and congruence closure with SAT search.
- c-cube 3 years ago
  
  There's a workshop exploring that: http://www.sc-square.org/CSA/welcome.html . They're trying to bridge cas and smt.
maple3142 3 years ago

I am not sure current SAT solver is good at solving things that a good CAS can do. It is fast at a lot of operations related to bits (xor, shift, and, or ...) but it performs way worse for things like solving a linear system in a finite field. (This is all from my personal experiences, so I may be wrong.)
- mathisfun123 3 years ago
  
  > linear system in a finite field
  I said it below, but I'll repeat it here: in my humble opinion, this is not what you want from a CAS. This is functionality better delegated to a BLAS (yes even with the finite field qualifier). And just because both CAS and BLAS have A in them, does not mean they are the same thing.
abecedarius 3 years ago

Is there an existing CAS built on top of a SAT or SMT solver?
- mathisfun123 3 years ago
  
  oss? none that i'm aware of that use a SAT/SMT solver for the term rewriting (like i'm suggesting). closed source, my strong intuition is both mathematica and magma work this way.
  - abecedarius 3 years ago
    
    Sounds like a great project idea to make a toy demo of this direction you'd like to see. Maybe comparable to https://github.com/norvig/paip-lisp/blob/main/docs/chapter15... and https://github.com/norvig/paip-lisp/blob/main/docs/chapter8.... which are a few hundred lines of Lisp each, but do enough to be interesting.
Q6T46nT668w6i3m 3 years ago

Toy? SymPy has room for improvement but it has made a tremendous impact in research and industry.
- sheepshear 3 years ago
  
  "Toy" is solver jargon that sort of means there's an alternative that blows it out of the water.
toth 3 years ago

Is Mathematica built like this?
- mathisfun123 3 years ago
  
  Yes but I doubt they're using z3 or cvc5 or any other oss sat/smt solver.
  - c-cube 3 years ago
    
    Do you have any evidence for that? My impression is that Mathematica is built on a rewriting language along with thousands of built-in procedures (some of which are sat/smt). I don't think its core engine itself is smt.
bmitc 3 years ago

> core CAS functionality (and performance) should be based on a SAT/SMT engine to discover the rewrite rules
Why is that? What are the alternatives?

7thaccount 3 years ago

I really liked the article and how it explained CAS vs Numerical solutions.

It also looks like SymPy or SymEngine is starting to catch up to Mathematica which also is pretty cool and does the same kind of expansion of an expression into a tree of sub expressions.

abdullahkhalids 3 years ago

Is there any comparison of their features anywhere? Last time I tried sympy a few years ago, it was quite a bit lacking compared to Mathematica.
- 7thaccount 3 years ago
  
  I don't personally know, but assume it'll take many years to catch up with Mathematica which has symbolic computing as their bread and butter with a large amount of developers adding to that codebase since like the 80s.

eigenket 3 years ago

I love SymPy. Its so useful for doing calculations I don't want to do myself.

qubex 3 years ago

I’ve tried to appreciate SymPy but I always find myself running home to Mathematica. There’s simply no comparison. SymPy is like a match and Mathematica has the power of a sizeable thermonuclear warhead.

rowanG077 3 years ago

It's true. Unfortunately Mathematica simply can't be used in many domains. I would really like to integrate mathematica with a type checker for automatic theorem proving. I think it could greatly alleviate the clunkiness of dependent types.
- nequo 3 years ago
  
  Do you know of attempts to integrate SymPy in this way?
  - rowanG077 3 years ago
    
    I tried it with symengine as a GHC(Haskell) Type checker plugin. But it was just too limited to be worthwhile.
- sheepshear 3 years ago
  
  What's preventing it from being used?
  - rowanG077 3 years ago
    
    I can hardly accept a cool $3230 per year license. I didn't even attempt to go this route. I doubt they would even allow it tbh. The site isn't exactly clear on that.
    
    sheepshear 3 years ago
    
    I thought you meant "domain" like a field of work or study.
    
    rowanG077 3 years ago
    
    Yes type checkers are a domain of study. Basically any domain where mathematica needs to integrated into the end result as is.
    
    sheepshear 3 years ago
    
    Implying there's a technical reason it couldn't be used.
    
    rowanG077 3 years ago
    
    I wasn't implying there is a technical reason, I definitely didn't mean it that way. If Mathematica were open sourced tomorrow nothing would stand in the way of trying it out.
    If a company simply blocks you from using their tech that definitely means you can't use it. For example. I can't run my own entertainment system on a Tesla car. There is no technical reason it's not possible. Tesla simply disallows it.
    
    sheepshear 3 years ago
    
    I get it I'm just surprised that's what you meant.

bionhoward 3 years ago

If you have fewer primitives and terminals than there are UTF-8 characters (1.1 million), then you could ditch OOP expression trees altogether and use simple strings in Polish notation with a mapping of utf-8 characters to operations (simple lambdas). That way you don’t need __dict__ on every node of every tree. However, you’d have to rewrite the stuff which expects the OOP trees to instead expect Polish notation strings. This approach scales a lot further than classes because you reduce the memory cost of the algebraic expressions down to the simplest string to represent them (and even smaller if you pack the bits into an ANS, that’s a performance hit to reduce memory more)

pxeger1 3 years ago

At that point it's hard to justify not just writing it in C directly

haberman 3 years ago

I remember being a kid and fawning over the upgrade from a TI-86 (which could not do symbolic manipulation) to the TI-89 (which could).

As an adult and OSS enthusiast, I've often wondered if there is an OSS option that can at least match, and ideally exceed, the TI-89's capabilities. Is SymPy it? I've had a few reasonably good experiences with SymPy, but I don't know much about the theoretical underpinnings of CAS, or how SymPy compares to competing offerings.

taeric 3 years ago

Depends what you mean? Mathematica is quite impressive.
Many symbolic options exist in lisps. https://stackoverflow.com/questions/10355112/why-is-lisp-so-... is a good answer that goes over some of the reason for that.
- haberman 3 years ago
  
  Mathematica is certainly as powerful as a TI-89, but not OSS.
  - taeric 3 years ago
    
    Right, though you can get a free license with raspberry pi, I think? https://www.wolfram.com/raspberry-pi/
    I'm sure there are restrictions, but likely not a concern for hobby interests?

HelloNurse 3 years ago

Regarding the issue of representing symbolic expression and controlling their evaluation or simplification, are there precedents of using e-graphs to memoize and reuse work rather than simple trees with destructive updates?

philzook 3 years ago

The herbie project using egraphs to explore different ways of rewriting floating point expressions. https://herbie.uwplse.org/ One can also write custom rulesets in egglog (a new egraph rewriting system / language / datalog) https://egraphs-good.github.io/egglog/?example=herbie
The approach is not yet anywhere near being able to touch all the domains sympy can handle. Destructive term rewriting tends to be a bit more forgiving to unsoundness in the rules and still returning roughly meaningful results. EGraph rewriting (and other automated reasoning systems) tend to just return junk as soon as you aren't careful about your semantics. Associativity and commutativity are ubiquitous in CAS applications and encoding these concepts in general purpose terms is rather unsatisfying. The post above emphasizes specialty methods for polynomials, which it would be desirable to find a clean way to integrate into egraph techniques. Variable binding (which is treated in a rather mangled form in CAS systems) is seemingly important for treating summation, differentiation, and integration correctly. The status of doing variable binding efficiently and correctly in egraphs is also unclear imo.

roger_ 3 years ago

SymPy is pretty nice but every time I use it for a real problem I end up hitting a wall and have to dig through the source or look at old issue reports for a workaround.

Most recently I wanted to simplify a complex expression with terms like ‘diag(v1) * v2’ into Hadamard products, and found I’d need to implement custom rules to get it to work.

amelius 3 years ago

Python is great in DL, so any chance we'll see a combination of neural nets and computer algebra in a new sympy?

j2kun 3 years ago

If the author is reading this: please add an rss feed to the blog! I'd love to follow along for updates

kzrdude 3 years ago

I think banking on SymEngineX ("SEX") for the Sympy 2.0 release would be interesting branding.

I'm also cheering for Sympy, I think it's longevity now still predicts success in the future.

alanbernstein 3 years ago

Excited to hear about the new LaTeX+SEX stack

Settings

Towards a new SymPy

Keyboard Shortcuts