Scallop – A Language for Neurosymbolic Programming
scallop-lang.org229 points by andsoitis a month ago
229 points by andsoitis a month ago
Wow, I'm currently reading the Scallop paper, so funny to see it posted here!
I really love the concept. This isn't just differentiable neurosymbolic declarative probabilistic programming; Scallop has the flexibility of letting you use various (18 included) or custom provenance semirings to e.g. track "proofs" why a relational fact holds, not just assign it a probability. Sounds cool but I'm still trying to figure out the practicality.
Also worth pointing out that it seems that a lot of serious engineering work has been done on Scallop. It has an interpreter and a JIT compiler down to Rust compiled and dynamically loaded as a Python module.
Because a Scallop program (can be) differentiable it can be used anywhere in an end-to-end learning system, it doesn't have to take input data from a NN and produce your final outputs, as in all the examples they give (as far as I can see). For example you probably could create a hybrid transformer which runs some Scallop code in an internal layer, reading/writing to the residual stream. A simpler/more realistic example is to compute features fed into a NN e.g. an agent's policy function.
The limitation of Scallop is that the programs themselves are human-coded, not learnt, although they can implement interpreters/evaluators (e.g. the example of evaluating expressions).
Papers are linked here https://www.scallop-lang.org/resources.html
https://www.cis.upenn.edu/~mhnaik/papers/neurips21.pdf
https://dl.acm.org/doi/10.1145/3591280
There is a 135 page book on Scallop https://www.cis.upenn.edu/~mhnaik/papers/fntpl24.pdf
I'm wondering if this is a limitation though. If it can be learnt from training data, would it not be part of the neural network training data? I imagine we use Scallop to bridge the gap where we can't readily learn certain rules based on available data or perhaps we would prefer to enforce certain rules?
I'm pretty sure "differentiable" isn't necessary or sufficient to create valid and useful code.
On the one hand, there are problems which by accident or design are nondifferentiable. Cryptography, for example.
In the other hand, these problems are routinely analyzed and solved by differentiable algorithms running on neural net substrates (e.g. you).
Ever since I learned about category theory and its relationship with symbolic reasoning I've suspected that AGI will come from elegantly combining symbolic reasoning and probabilistic reasoning. This is the first project I've seen that seems to be positioned that way. Very cool.
When LLMs code in order to reason, isn’t that a combination of probabilistic reasoning and symbolic reasoning?
Neural networks are actually somewhere in between. They don't directly operate on symbolic expressions or explicit logical rules. And while they rely on probabilistic aspects for training (and sometimes for inference), they rely more on continuous-valued transformations in extremely high dimensional spaces. But if your goal is human-like intelligence, they are a pretty good bet, because we know the human brain also doesn't perform symbolic reasoning at its core and these things only emerge as high-level behaviour from a sufficiently complex system. But it also makes neural networks (and us too) prone to failure modes that you would not see in strictly symbolic reasoning processes.
Yes, this seems to be what the symbolists always forget. We don’t use symbolism like this, we just have very dense neural connections that emerge from scale and approximate it
That's not really what symbolists actually argue. Read Fodor and Pylylshyn 1988 to better understand how symbolists view the relationship between symbolic representation and connectionist models. What you're saying is akin to saying there's no point in doing neuroscience or trying to understand neural-networks as its just particle physics deep down.
Connectionism and cognitive architecture: A critical analysis https://www.semanticscholar.org/paper/Connectionism-and-cogn...
Fodor and Pylyshyn's Critique of Connectionism and the Brain as Basis of the Mind https://arxiv.org/abs/2307.14736
So if inventing the airplane how long should one stick with a flapping bird wing design?
That is not a good analogy. Symbolism has given us lots of useful things, including SAT/SMT and theorem provers.
The point is that while biology has been a great source of inspiration toward technical advances it’s only a guide, it can’t guarantee there isn’t a better way.
http://forestdb.org is quite old and includes some toy examples that IMHO elegantly combine symbolic and probabilistic reasoning.
Related: Graph&Category-Theory-based Neuro-Symbolic AI
> The work uses graphs developed using methods inspired by category theory as a central mechanism to teach the model to understand symbolic relationships in science.
https://news.mit.edu/2024/graph-based-ai-model-maps-future-i...
I know a tiny bit about category theory but nothing about symbolic reasoning. Would anyone mind ELI5ing the connection between the two?
I'm not qualified to offer an accurate eli5, just a hand-wavy explanation...
Category theory can be leveraged to make faster theorem provers (making complex symbolic reasoning practical at larger scales).
Don't ask me how, hopefully someone who studies it will chime in and correct me / expand.
There is a long history of efforts to combine symbolic and connectionist approaches. This is hardly the first!
How does Scallop scale on large knowledge bases (KBs) for doing probabilistic reasoning? I'm currently working on large KB with ~ 12M facts and trying to do probabilistic inference on it. So far I've been using [cplint](https://friguzzi.github.io/cplint/_build/html/index.html) which is based on SWI-Prolog. It works fine for toy examples, however, it doesn't finish running for the large KB - even after waiting for it for more than a week. Does know any Probabilistic Logic Programming (PLP) libraries that are fast and scale to large KBs? Preferably in Prolog ecosystem, but not a hard requirement.
I am surprised you have problems with 12M facts and can't process them in a week, looks like bug in software you are using.
Thanks for the comment. Have you run cplint on a kb of the similar size before and gotten it to finish in reasonable time?
I never used cplint, but I use other software (including I built myself) to process KBs with many billions of facts.
If you like scallop, you are gonna love lobster:
Unfortunately it doesn't seem to be available yet. Scallop and Lobster are both from UPenn, and the Scallop website says "We are still in the process of open sourcing Scallop," so I assume it's a matter of time.
Thank you.
you seem to be more in the know than me :) Please could you just sketch out a few bullets and explain the relationship between Scallop and Lobster and what you think is going on?
I read the paper on Lobster a little bit. Scallop does its reasoning on the CPU - whereas Lobster is an attempt to move that reasoning logic to the GPU. That way the entire neurosymbolic pipeline stays on the GPU and the whole thing runs much faster.
The problem with scallop is that no one has ever shown a single program that wasn't easier to write without it. Their papers usually contain also no examples, and the ones that do are a heck of a lot more complicated than asking the model to do it directly. The programming languages world let's them get away with this because they're excited to be relevant in the LLM universe. But I wouldn't accept any of this work if I was their reviewer; they need to write real programs and compare against baselines. Many of them. And show that their language provides any value.
Just look at the examples on their website. All 3 are lame and far easier without their language.
It's like publishing that you have a new high performance systems language and never including any benchmark. They would be rejected for that. Things just haven't caught up in the ML+PL world.
I think you misunderstand what a neuro-symbolic programming language (like Scallop) is for.
It's not about performance, but safety.
Making safe decisions becomes exponentially more important as ML / agents evolve, to avoid "performant" but ultimately inefficient/dangerous/wasteful inferences.
Then show me programs that meaningfully improves safety. And compare them to baseline options to demonstrate this. None of these examples improve safety beyond a trivial check on the output which I can also do with a simple prompt.
I looked at Scallop a year ago and decided that it was not a replacement for Prolog - for me.
I may re-evaluate now, thinking of smoother LLM integration as well as differentiability.
Has anyone here used Scallop for a large application? I ask because in the 1980s I wrote a medium large application in Prolog and it was a nice developer experience.
Not Scallop related but did you try Mercury? It is prolog with types and flagging of deterministic functions; the Prolog we ported had a very large (... vague, I know) performance boost and that is a lot of code. Porting is gradual mostly.
Love to see this! I’m a huge fan of Neurosymbolic methods, but more advanced examples might be needed to help convince folks to adopt or try Scallop. The three on the page feel very toy. An example rooted in NLP or working with an LLM front and centered might help
Very pleasant branding though. Great work! :)
How does Scallop compare to PyReason (https://neurosymbolic.asu.edu/pyreason/)? Are they by and large the same, or tailored towards different use cases?
A bit over my head - but can't Prolog achieve similar results?
Anything can do anything else given enough time and power, but I think: no, not without shenanigans. This has primitives for interfacing to nn's including foundation models so you can ask it (for example) to label images of cats and dogs using clip, then you can reason over the results.
So it's intended to combine nn reasoning and logical reasoning cleanly.
Scallop's examples have syntax for probabilistic programming, so probably not.
The SCC example is interesting, I wonder what behavior that gens to. Reminds me of Lean, have to suspect it may make the processor quite spicy, like Lean. Also don't see clear indication that this benefits from heterogeneous compute resources.
Oh, boy, it's written in Rust!
I wish this website explained what neurosymbolic means.
It's a combination of neural networks and symbolic reasoning. You can use a neurosymbolic approach by combining deep learning and logical reasoning:
A neural network (PyTorch) detects objects and actions in the image, recognizing "Jim" and "eating a burger" with a confidence score.
A symbolic reasoning system (Scallop) takes this detection along with past data (e.g., "Jim ate burgers 5 times last month") and applies logical rules like:
likes(X, Food) :- frequently_eats(X, Food).
frequently_eats(Jim, burgers) if Jim ate burgers > 3 times recently.
The system combines the image-based probability with past symbolic facts to infer: "Jim likely likes burgers" (e.g., 85% confidence).This allows for both visual perception and logical inference in decision-making.
Also can be used to verify NN decisions. In autonomous driving, a NN can make “instinctive” decisions, and a GOFAI system can verify they work and don’t break civil or physical laws. You can have many parallel NN giving recommendations, and let a symbolic system take the final decision.
Is the reasoning strictly downstream of the image recognition? Or can prior knowledge impact how objects are recognized? E.g. I'm driving on the road at night so the two incoming lights are probably a car.
In your specific example, time of day, weather (foggy, sunny, over-cast) along with images of cars with different colors, models, makes, from different angles will all be training parameters to begin with so the neural net can do this on its own without needing specific symbolic processing apriori or downstream. Training data input into neural nets is usually sanitized and transformed to some extent but whether this sanitization / preprocessing requires symbolic programming depends on the use case. For example, with the car example, you preprocess car images to color them differently, hide random sections of it, clip it in different ways so only partial sections are showing, turn them upside down, introduce fake fog, darken, lighten, add people, signs, fire, etc and use each of these images for training so that the neural net can recognize cars under different situations (even after accidents where they are upside down and on fire). Eventually the neural net will recognize a car in most circumstances without symbolic programming/intervention.
So when would you use symbolic programming? To generate quality data for the neural network. For example, maybe the neural net reports it read the speed limit to be 1000 km/h on a sign because of someone's shenanigans. A symbolic programming aid which knows potential legal limits will flag this data as potentially corrupt and pass it back to the network as such allowing the neural network to take more sensible decisions.
Is this really all they different from writing some functions in any language that use a neural net to make these predictions?
Why is this a language and not just some say, Java/Rust library?
It's interesting but doesnt seem like fundamentally anything new.
I love its logo (the color scheme, too).
By the way I wish there were more real-life examples, both basic and advanced to show what it may be especially useful for, maybe even compare it to other languages like Prolog. I expected the tutorial to have examples for what "neurosymbolic" means, because I am not entirely sure what it means in practice.
Previous discussion a few years ago: https://news.ycombinator.com/item?id=31060265
This is amazing. I've been looking forward to such a thing for a while now.
I honestly have no idea what any of this means.
It seems like schizo ramblings to me. But I'm sure there's some merit to it.
Let's say you want to program a system that uses a variety of neural nets for reasoning about a problem, and you also want to use more traditional programmatic reasoning - for example to score and rank results. I think this is the kind of language that could allow you to do that.
But - it's time to do the tutorials and try and see.
Icky name
The tutorial claims that fib(0) = 1, which is wrong.
https://en.wikipedia.org/wiki/Fibonacci_sequence
This one was easy to spot and would have been easy to get right. Makes me wonder…
From the first paragraph on Wiki:
> Many writers begin the sequence with 0 and 1, although some authors start it from 1 and 1[1][2] and some (as did Fibonacci) from 1 and 2.
From the first paragraph in the article you linked:
> Many writers begin the sequence with 0 and 1, although some authors start it from 1 and 1[1][2] and some (as did Fibonacci) from 1 and 2.
If you encode a business on top of this, you get differentiable management. Metrics disappear behind obscure activation vectors/embeddings, cargo cult is not possible anymore, everything traced back to measured economic efficiency.
I'm really confused. Is this metaprogramming in the sense that, add_relation and add_rule are using an LLM to make an educated guess about what to do based on what went before it? Or is it using some deterministic method or heuristic to evaluate those terms?