Show HN: System that rediscovers physics laws from raw data autonomously
protoscience.aiProtoScience is a deterministic pipeline that takes raw numerical data and autonomously discovers governing equations.
It does not use LLMs for discovery — only sparse regression, power-law fitting, and statistical validation.
Results so far:
- Kepler's Third Law (P² = a³ / M) from 3,519 NASA exoplanets — R² = 0.998 - Sun’s ~27-day rotation period from solar wind plasma data — 93% accuracy - Power law T ~ v^3.40 in solar wind (NOAA/NASA spacecraft data) - 5/5 General Relativity predictions from simulated black hole observables — all R² = 1.000 - Chirp mass relationship from 219 LIGO gravitational wave events — R² = 0.998
It also detects when no meaningful law exists — Bitcoin daily prices returned R² = 0.00.
Pipeline:
raw data → feature extraction → candidate law generation → fitting → verification
An LLM (Claude) is only used at the end to interpret results in natural language — it is never involved in the discovery step.
All experiments are fully reproducible.
Code:
https://github.com/SaulVanCode/protoscience-nasa-experiments The BH series is wild. Rediscovering GR relations from observables alone with R²=1.000. No physics priors, just raw data. That's the part that got me. Exactly — that was the surprising part for me too. The system has no notion of “physics” at all — it’s just searching for
compressible structure in the data. The fact that GR relations emerge from observables suggests that
a lot of what we call “laws” might just be the simplest compressions
of measurement space. Still early, but curious how far this goes.