Towards a new SymPy
oscarbenjamin.github.ioPart II made the front page yesterday: https://news.ycombinator.com/item?id=37426080
A comment there makes what I think is a very good point about "the lack of consolidation of computer algebra efforts": https://news.ycombinator.com/item?id=37430437
I don't know what might drive or foster such consolidation. Maybe Category Theory? Bridging syntax?
How about Lean? [0] There's a whole library of mathematics written down in Lean called Mathlib, which spans most of the undergraduate maths curriculum upto some cutting edge research-level maths. I've commented under the Part II post you linked to as well, describing how I think Mathlib could help the CAS ecosystem.
That's an entirely different thing though. Good luck getting Lean to help you do any symbolic computation whatever. You can use it to prove that a given manipulation is correct. You cannot use it to find a result (there may be a symbolic math library for Lean eventually but currently there isn't).
That's not my point. In the comment I made below the other post I talked about how Mathlib is just a piece of the puzzle, the way I see it [0]. I don't want to see Lean do symbolic computation - although I'm sure that would be great as well - what I want to see is to have Mathlib's library of mathematics being used in a way that any CAS can read definitions and theorems from it. Then a significant part of the CAS codebase won't have to be manually written, and developers can spend time on things that are not re-inventing some wheel.
Fair enough, that would be nice. However, it sounds like a very hard thing to build in practice. I'd expect, actually, it's best done in Lean itself. So you're proposing a CAS written in Lean, which noone is currently working on.
In Coq, the tactics system supports writing solvers than can both find a result and give you the proof that it's correct. If Lean also has a similar tactics system, then solvers could be written in that.
Lean does have a tactics system (very similar to Coq). However, there is no general tactic that will do, e.g., "sin^2 + cos^2 = 1" for you. (Writing a tactic specifically for that simplification is easy, of course, but that's not what a CAS is). (I doubt there is one in Coq that will do this, although I'm not sure). A symbolic algebra system is expected to do this simplification (and much more advanced ones) automatically. All in all, proof assistants aren't computer algebra systems. You could make computer algebra systems in those frameworks but I'm not aware of any.
Typically, there are 'group', 'ring' and 'field' tactics that will perform simplification in the relevant algebraic structures. Going from that to solving sin^2 x + cos^2 x is simply a matter of writing more such tactics, operating on e.g. real fields or whatever is applicable in any given case.
This is a lot less ad-hoc than whatever CAS's do, but it's also less error prone.
Do any existing CAS systems have configurable axioms? OTOH: Conway's surreal infinities, Do not early eliminate terms next to infinity, Each instance of infinity might should have a unique identity, configurable Order of operations,
All of the axiomatic transformations applied by a CAS like SymPy should/must be in Lean Mathlib somewhere? If nothing else, a lookup_lean_mathlib_definition(expr, 'path/to/mathlib-v0.0.2') or find_similar(expr, AxiomDB) would be useful.
How do CAS differ from Production Rule Systems? https://en.wikipedia.org/wiki/Production_system_(computer_sc...
CAS > Simplification: https://en.wikipedia.org/wiki/Computer_algebra#Simplificatio...
Rewriting: https://en.wikipedia.org/wiki/Rewriting
Because the rulesets are expected to change, rules engines have functionality to compile rules into a tree or better for performance.
eBPF is not a rules engine, but it does optimize filter sets IIRC?
Julia's Symbolics.jl allows users to easily add custom rewrite rules. However, it's not a state-of-the-art CAS. I don't have that much experience with others. Mathematica doesn't allow it, I think.
FWIU Wolfram's searching for a unified model with the Wolfram Physics Project, too; e.g. "The Physicalization of Metamathematics and Its Implications for the Foundations of Mathematics" (2022) https://www.wolframscience.com/metamathematics/ https://www.wolframphysics.org/bulletins/
Are fundamental constants other-valued in any Many Worlds interpretations, or are e, i, and Pi always e, i, and pi with the same relations?
Countability and continuua (in a Hilbert space of degree n, where n is or is not inconstant like the many forms of [quantum discord] entropy and the energy that represents them)
TIL the separable states problem is considered NP-hard, and many models specify independence of observation as necessary.
I don't see how that relates to the question of "does Mathematica allow users to define simplification rules?"
https://www.ma.imperial.ac.uk/~buzzard/xena/natural_number_g... :
> In this game, you get own version of the natural numbers, called `mynat`, in an interactive theorem prover called Lean. Your version of the natural numbers satisfies something called the principle of mathematical induction, and a couple of other things too (Peano's axioms).
Such axioms are hard-coded in SymPy, SageMath, and Mathematica but not in Lean Mathlib (which is not optimized for performance)
If there are an infinite number of unitary transformations (as rotations on a Bloch sphere for example), I find it unlikely that we've yet discovered all of the requisite operators and axioms and coded them into any human CAS that exists in present day.
Mathlib is not nearly as complete as advertised. It is very much a collection of research projects with little cohesion.
A CAS also covers only a tiny part of "an undergrad math degree" so that's beside the point here. Assuming it had appropriate tactics doing the job of a CAS using all theorems (which isn't happening any time soon), what math would be missing?
The unfortunate thing is that core CAS functionality (and performance) should be based on a SAT/SMT engine to discover the rewrite rules and it goes without saying (but I'll say it anyway) implementing that is "hard". Naively (i.e., I haven't really thought through it), I think sympy should be built on top of z3 or cvc5 or at least very tighly integrated (e.g., support as a backend). And (again naively) I think until such time, sympy will remain a toy.
I will say though that symengine is a great project and congrats to that guy for pulling it off under the constraints of a phd.
Not an expert in SAT solvers, personally. What would the benefit be in using one?
Anaconda's absurdly slow dependency resolver (it takes up to 15 minutes for things Pip installs in seconds, with no way of disabling it outside of installing a third party solver) is based on SAT and left a bad taste in my mouth for them (slow, clunky, etc), but maybe it's just a poor implementation on their part.
> Anaconda's absurdly slow dependency resolver
On the one hand this is like comparing apples and oranges. On the other hand SAT is slow for dependency management not because the solver itself is slow (it might well be - I don't know which solver they packaged) but because it's an NP-complete problem.
Back to sympy: a computer algebra system (CAS) is primarily (IMHO) an algebra system, not a matrix manipulation library or a pde solver or whatever kitchen sink collection of things is in all of them. Algebra in this context means manipulating algebraic expressions and that's term rewriting and that's also NP-complete (well at least in some form or fashion, eg egraph extraction).
So in summary - there's no way out of using SAT/SMT here.
> slow, clunky, etc
Just to put a finer point on this - it's a very shallow thing to look at conda's or whomever's implementation and then paint over SAT/SMT with that same brush. The way that I usually describe z3 is that it is nuclear weapons grade industrial software. I mean jesus christ its stated goal is solving NP complete/hard problems and it frequently succeeds at this goal on problems with millions of decision variables and clauses. It is absolutely the highest tech piece of tech out there (XYZ pytorch/tensorflow ai ml thing pales in comparison) and we are all extremely lucky that it is licensed permissively and developed completely in the open (and basically by one guy!). And it is being used in many many places for very serious engineering.
Yes, this is a big disadvantage. But have you tried Mamba that aims at implementing Anaconda more efficiently? It works really well in most cases.
Mamba is great, and I use it when possible. It's probably the single best thing to happen to Anaconda in years. Nowadays it's installed automatically, but up until a month ago you had to deal with the slow solver to install it.
And fortunately it will become the default by Q4!
The point I was making is the old solver left an impression on me that SAT is slow and inefficient, that's all really :)
You can use mamba's much faster solver directly with conda.
conda install -n base conda-libmamba-solver conda config --set solver libmamba
Fedora's dnf also uses SAT and it's way faster (with caching) than the handwritten resolver it replaced.
Miniconda has a better dependency resolver, and it can resolve in less than a minute in most cases.
Miniconda is faster (I prefer it, personally)...
...but Miniconda and Anaconda are the same package manager, the difference is that Anaconda comes batteries-inclused with a ton of common packages.
The reason Miniconda is faster isn't the solver, it's because it doesn't have 250 packages to go through the long checking process on.
If you have a large, complex environment in Miniconda, that environment will get slower and slower to resolve over time.
250 is not a big number... :(
They have the same resolver (Conda)
You just have fewer packages installed with miniconda, so the sat problem is much smaller
You might enjoy ruler https://github.com/uwplse/ruler
It would be very interesting for SMT and CAS to converge a bit more. SMT in expressiveness and domains and CAS in rigor.
The modality of their usage is different. CAS tends to return some expressions of interest, which it is hard to get SMT to do. Either you get "unsat" or a particular model from an SMT solver, not a simplified expression (ok, z3 has a simplify command, which is pretty cool).
SMT today is not obviously expressive enough to handle most of the domains and questions that come up in CAS systems.
Most SMT solvers do not intrinsically handle transcendental functions or any notions of calculus, abstract algebra, etc.
CAS systems are largely interested in problems of equational reasoning, whereas SMT's bread and butter is gluing together "trivialities" like linear inequalities and congruence closure with SAT search.
There's a workshop exploring that: http://www.sc-square.org/CSA/welcome.html . They're trying to bridge cas and smt.
I am not sure current SAT solver is good at solving things that a good CAS can do. It is fast at a lot of operations related to bits (xor, shift, and, or ...) but it performs way worse for things like solving a linear system in a finite field. (This is all from my personal experiences, so I may be wrong.)
> linear system in a finite field
I said it below, but I'll repeat it here: in my humble opinion, this is not what you want from a CAS. This is functionality better delegated to a BLAS (yes even with the finite field qualifier). And just because both CAS and BLAS have A in them, does not mean they are the same thing.
Is there an existing CAS built on top of a SAT or SMT solver?
oss? none that i'm aware of that use a SAT/SMT solver for the term rewriting (like i'm suggesting). closed source, my strong intuition is both mathematica and magma work this way.
Sounds like a great project idea to make a toy demo of this direction you'd like to see. Maybe comparable to https://github.com/norvig/paip-lisp/blob/main/docs/chapter15... and https://github.com/norvig/paip-lisp/blob/main/docs/chapter8.... which are a few hundred lines of Lisp each, but do enough to be interesting.
Toy? SymPy has room for improvement but it has made a tremendous impact in research and industry.
"Toy" is solver jargon that sort of means there's an alternative that blows it out of the water.
Is Mathematica built like this?
Yes but I doubt they're using z3 or cvc5 or any other oss sat/smt solver.
Do you have any evidence for that? My impression is that Mathematica is built on a rewriting language along with thousands of built-in procedures (some of which are sat/smt). I don't think its core engine itself is smt.
> core CAS functionality (and performance) should be based on a SAT/SMT engine to discover the rewrite rules
Why is that? What are the alternatives?
I really liked the article and how it explained CAS vs Numerical solutions.
It also looks like SymPy or SymEngine is starting to catch up to Mathematica which also is pretty cool and does the same kind of expansion of an expression into a tree of sub expressions.
Is there any comparison of their features anywhere? Last time I tried sympy a few years ago, it was quite a bit lacking compared to Mathematica.
I don't personally know, but assume it'll take many years to catch up with Mathematica which has symbolic computing as their bread and butter with a large amount of developers adding to that codebase since like the 80s.
I love SymPy. Its so useful for doing calculations I don't want to do myself.
I’ve tried to appreciate SymPy but I always find myself running home to Mathematica. There’s simply no comparison. SymPy is like a match and Mathematica has the power of a sizeable thermonuclear warhead.
It's true. Unfortunately Mathematica simply can't be used in many domains. I would really like to integrate mathematica with a type checker for automatic theorem proving. I think it could greatly alleviate the clunkiness of dependent types.
Do you know of attempts to integrate SymPy in this way?
I tried it with symengine as a GHC(Haskell) Type checker plugin. But it was just too limited to be worthwhile.
What's preventing it from being used?
I can hardly accept a cool $3230 per year license. I didn't even attempt to go this route. I doubt they would even allow it tbh. The site isn't exactly clear on that.
I thought you meant "domain" like a field of work or study.
Yes type checkers are a domain of study. Basically any domain where mathematica needs to integrated into the end result as is.
Implying there's a technical reason it couldn't be used.
I wasn't implying there is a technical reason, I definitely didn't mean it that way. If Mathematica were open sourced tomorrow nothing would stand in the way of trying it out.
If a company simply blocks you from using their tech that definitely means you can't use it. For example. I can't run my own entertainment system on a Tesla car. There is no technical reason it's not possible. Tesla simply disallows it.
I get it I'm just surprised that's what you meant.
If you have fewer primitives and terminals than there are UTF-8 characters (1.1 million), then you could ditch OOP expression trees altogether and use simple strings in Polish notation with a mapping of utf-8 characters to operations (simple lambdas). That way you don’t need __dict__ on every node of every tree. However, you’d have to rewrite the stuff which expects the OOP trees to instead expect Polish notation strings. This approach scales a lot further than classes because you reduce the memory cost of the algebraic expressions down to the simplest string to represent them (and even smaller if you pack the bits into an ANS, that’s a performance hit to reduce memory more)
At that point it's hard to justify not just writing it in C directly
I remember being a kid and fawning over the upgrade from a TI-86 (which could not do symbolic manipulation) to the TI-89 (which could).
As an adult and OSS enthusiast, I've often wondered if there is an OSS option that can at least match, and ideally exceed, the TI-89's capabilities. Is SymPy it? I've had a few reasonably good experiences with SymPy, but I don't know much about the theoretical underpinnings of CAS, or how SymPy compares to competing offerings.
Depends what you mean? Mathematica is quite impressive.
Many symbolic options exist in lisps. https://stackoverflow.com/questions/10355112/why-is-lisp-so-... is a good answer that goes over some of the reason for that.
Mathematica is certainly as powerful as a TI-89, but not OSS.
Right, though you can get a free license with raspberry pi, I think? https://www.wolfram.com/raspberry-pi/
I'm sure there are restrictions, but likely not a concern for hobby interests?
Regarding the issue of representing symbolic expression and controlling their evaluation or simplification, are there precedents of using e-graphs to memoize and reuse work rather than simple trees with destructive updates?
The herbie project using egraphs to explore different ways of rewriting floating point expressions. https://herbie.uwplse.org/ One can also write custom rulesets in egglog (a new egraph rewriting system / language / datalog) https://egraphs-good.github.io/egglog/?example=herbie
The approach is not yet anywhere near being able to touch all the domains sympy can handle. Destructive term rewriting tends to be a bit more forgiving to unsoundness in the rules and still returning roughly meaningful results. EGraph rewriting (and other automated reasoning systems) tend to just return junk as soon as you aren't careful about your semantics. Associativity and commutativity are ubiquitous in CAS applications and encoding these concepts in general purpose terms is rather unsatisfying. The post above emphasizes specialty methods for polynomials, which it would be desirable to find a clean way to integrate into egraph techniques. Variable binding (which is treated in a rather mangled form in CAS systems) is seemingly important for treating summation, differentiation, and integration correctly. The status of doing variable binding efficiently and correctly in egraphs is also unclear imo.
SymPy is pretty nice but every time I use it for a real problem I end up hitting a wall and have to dig through the source or look at old issue reports for a workaround.
Most recently I wanted to simplify a complex expression with terms like ‘diag(v1) * v2’ into Hadamard products, and found I’d need to implement custom rules to get it to work.
Python is great in DL, so any chance we'll see a combination of neural nets and computer algebra in a new sympy?
If the author is reading this: please add an rss feed to the blog! I'd love to follow along for updates
I think banking on SymEngineX ("SEX") for the Sympy 2.0 release would be interesting branding.
I'm also cheering for Sympy, I think it's longevity now still predicts success in the future.
Excited to hear about the new LaTeX+SEX stack