GitHub - YuminosukeSato/bsts-causalimpact: Bayesian CausalImpact for Python — R-compatible Gibbs sampler in Rust (PyO3), 30x faster, no TensorFlow

Bayesian structural time series for causal inference in Python. A faithful port of Google's CausalImpact R package. No TensorFlow required.

The Gibbs sampler is implemented in Rust (via PyO3), reproducing the same algorithm as R's bsts package while achieving 10-30x speedup.

When to Use (and When Not to)

This method is valid only when all of the following hold:

Control series are not contaminated by the intervention
The relationship between treated and control series is stable across the pre- and post-intervention periods
The pre-intervention period is sufficiently long (rule of thumb: at least 3x the post-intervention period)

If any of these assumptions are violated, the causal estimate will be unreliable. Consider a difference-in-differences or synthetic control approach instead.

Installation

Requires Python 3.10+. Binary wheels are intended for supported platforms, so Rust is only required when building from source.

pip install bsts-causalimpact

For development (builds Rust extension locally):

git clone https://github.com/YuminosukeSato/bsts-causalimpact.git
cd bsts-causalimpact

# Install with uv (recommended)
uv sync --all-extras

# Or install with pip (builds Rust extension via maturin)
pip install -e ".[dev]"

Quick Start

import pandas as pd
from causal_impact import CausalImpact

# Prepare your data: first column = response, remaining columns = covariates
data = pd.read_csv("your_data.csv", index_col="date", parse_dates=True)

# Define pre- and post-intervention periods
pre_period = ["2020-01-01", "2020-03-14"]
post_period = ["2020-03-15", "2020-04-14"]

# Run the analysis
ci = CausalImpact(data, pre_period, post_period)

# Print a summary table
print(ci.summary())

# Print a narrative report
print(ci.report())

# Plot the results
fig = ci.plot()
fig.savefig("causal_impact.png")

Example Output

Posterior inference {CausalImpact}

                         Average        Cumulative
Actual                   136.32          3953.19
Prediction (s.d.)        125.42 (0.66)   3637.07 (19.08)
95% CI                   [124.18, 126.71]  [3601.33, 3674.59]

Absolute effect (s.d.)   10.90 (0.66)    316.13 (19.08)
95% CI                   [9.61, 12.13]   [278.60, 351.86]

Relative effect (s.d.)   8.69% (0.57%) 8.69% (0.57%)
95% CI                   [7.58%, 9.77%] [7.58%, 9.77%]

Posterior tail-area probability p: 0.001
Posterior prob. of a causal effect: 99.90%

Comparison with Alternatives

	R CausalImpact	bsts-causalimpact (this)	tfp-causalimpact	tfcausalimpact	pycausalimpact
Maintainer	Google	OSS	Google	WillianFuks	dafiti (stale)
Algorithm	Gibbs (bsts/C++)	Gibbs (Rust)	TFP-based	VI default / HMC	MLE (statsmodels)
Dependencies	R, bsts	numpy, pandas, matplotlib	TF, TFP (3 GB+)	TF, TFP (3 GB+)	statsmodels
Spike-and-slab	Yes	Yes	Unknown	No	No
Horseshoe prior	No	Yes (`prior_type='horseshoe'`)	No	No	No
Seasonal component	Yes	Yes (`nseasons`, `season_duration`)	Unknown	Yes (TFP STS)	No
Dynamic regression	Yes	Yes (`dynamic_regression=True`)	Unknown	No	No
R numerical test	Reference	±1% CI-enforced + TOST/ROPE	Not published	Visual comparison (~8% diff)	Not tested
Speed (T=1000)	2.1 s	0.07 s (30x)	Seconds	Minutes (HMC: hours)	Sub-second
Python version	N/A (R)	3.10+	3.8+	3.7-3.11	3.6-3.8 (stale)
Last release	Active	Active	2023	2025-01	2020-05

Why this library exists

Existing Python ports have fundamental limitations:

pycausalimpact uses MLE (not MCMC), producing results that diverge substantially from R
tfcausalimpact uses variational inference by default (not Gibbs sampling), and requires TensorFlow (3 GB+)
tfp-causalimpact (Google's own Python port) does not publish numerical equivalence tests with R
None of the above implement spike-and-slab variable selection matching R's bsts

This library reproduces the core Gibbs-sampler workflow from R's bsts package in Rust, with CI-enforced numerical equivalence tests on every commit.

Numerical Equivalence with R

Verified against R CausalImpact 1.4.1 (bsts 0.9.10, R 4.5) across 5 scenarios. Enforced on every commit via CI.

Test Matrix

Scenario	point_effect	cum_effect	ci_lower	ci_upper	rel_effect	p_value
basic	±1%	±1%	±1%	±1%	±1%	alpha=0.05
covariates	±1%	±1%	±1%	±1%	±1%	alpha=0.05
strong_effect	±1%	±1%	±1%	±1%	±1%	alpha=0.05
no_effect	abs<0.5	abs<0.5	abs<0.5	abs<0.5	abs<0.5	alpha=0.05
seasonal	±1%	±1%	±1%	±1%	±1%	alpha=0.05

Three-Layer Equivalence Verification

No other Python CausalImpact implementation has statistical equivalence tests. This library provides three layers of verification, exceeding even Google's own Python port.

Layer	Method	What it proves	Reference
1. Deterministic	Seed-fixed ±1% threshold	Same seed, same result, every commit	Regression testing
2. FDA TOST	90% CI upper < delta (N=30 seeds)	Mean error is statistically below delta	Schuirmann (1987), FDA Guidance (2001)
3. Bayesian ROPE	95% HDI within [0, delta] (N=30 seeds)	Posterior of error is practically equivalent	Kruschke (2018) AMPPS

Layer 1 runs on every commit (CI-blocking). Layers 2-3 run with --runslow flag.

# Layer 1: deterministic (runs in CI)
uv run pytest tests/test_numerical_equivalence.py -v

# Layers 2+3: TOST + ROPE (30 seeds x 4 scenarios, ~80s)
uv run pytest tests/test_equivalence_tost_rope.py -v --runslow

CI Enforcement

Two-layer CI enforcement:

Fixture-based (ci.yml): Compares Python output against committed R reference data. Blocking on every PR/push.
Live R comparison (numerical-equivalence.yml): Installs R, regenerates fixtures from scratch, and compares. Blocking when R is available. Weekly auto-regeneration.

How to Reproduce

Install R 4.5+ and packages: install.packages(c("CausalImpact", "jsonlite"))
Generate R reference: Rscript scripts/generate_r_reference.R
Run equivalence tests: .venv/bin/pytest tests/test_numerical_equivalence.py -v

Equivalence Verification: Comparison with Other Implementations

	R CausalImpact	bsts-causalimpact (this)	tfp-causalimpact (Google)	tfcausalimpact	pycausalimpact (dafiti)
R reference fixtures	N/A (is reference)	5 scenarios, CI-enforced	None	None	None
Deterministic R test	N/A	±1% all metrics (seed=42)	None	README demo only (~8% diff)	None
FDA TOST (Schuirmann 1987)	N/A	30-seed, delta=1-2%	None	None	None
Bayesian ROPE (Kruschke 2018)	N/A	30-seed, 95% HDI in ROPE	None	None	None
Self-consistency tolerance	`tolerance=0.01`	N/A	`rtol=0.2` (20%)	`assert_almost_equal`	`assert_array_equal`
CI-blocking R check	N/A	Every commit + weekly live R	None	None	None

Evidence per implementation (all verified from source code, not documentation claims):

tfp-causalimpact (Google official Python port): 47 tests across 7 files. No R reference fixtures. Self-consistency checks use rtol=0.2 (±20%) and atol=0.01. No R output comparison exists in the test suite. README states "designed to produce results close to the R package" but this is not verified by any automated test. (source)
tfcausalimpact (WillianFuks): 64 tests across 7 files. tests/fixtures/comparison_data.csv exists but is only used in README demo, not in automated tests. The README demo itself shows ~8% difference in AbsEffect (R: -657 vs Python: -708.51). (source)
pycausalimpact (dafiti): 62 tests across 5 files. Uses MLE (not MCMC), fundamentally different algorithm. No R comparison of any kind. Repository archived. (source)
causalimpact (jamalsenouci): 54 tests across 4 files. No R comparison. Open issue #7 reports "significantly different results from R". (source)

What is matching R and what is not

R feature	Status	Detail
Local level model (Gibbs sampler)	Matching	Same algorithm as bsts: Kalman filter + simulation smoother
SdPrior(sample.size=32) for sigma2_level	Matching	InvGamma(16, 16 * sigma_guess^2)
Post-period Random Walk propagation	Matching	Forward simulation from last pre-period state
Data standardization (standardize.data=TRUE)	Matching	(y - mean) / sd using pre-period moments
prior.level.sd = 0.01	Matching	Same default, same semantics
Spike-and-slab variable selection	Matching	Coordinate-wise sampling with StudentSpikeSlabPrior defaults (`expected.r2=0.8`, `prior.df=50`, `prior.information.weight=0.01`, `diagonal.shrinkage=0.5`)
expected.model.size	Matching	Unified default `2` in `CausalImpact` and `ModelOptions`
expected.r2 = 0.8, prior.df = 50	Matching	Same documented residual variance prior defaults as BoomSpikeSlab / bsts
Seasonal component (`nseasons`, `season_duration`)	Matching	State-space model matching R bsts `AddSeasonal()` (±1% CI parity)
Dynamic regression	Supported	Time-varying coefficients via random-walk FFBS; `dynamic_regression=True`
Local linear trend	Supported	Opt in with `state_model="local_linear_trend"`
Placebo test	Extended	Null distribution from pre-period splits
Horseshoe prior	Extended	Continuous shrinkage alternative to spike-and-slab (Kohns & Bhattacharjee 2022)
Conformal inference	Extended	Distribution-free prediction intervals
DTW control selection	Extended	Automatic covariate selection via Dynamic Time Warping

Matching = CI-enforced numerical equivalence with R bsts (±1% or tighter). Supported = Feature implemented, no R parity fixture yet. Extended = Python-only feature with no R equivalent.

Beyond R: Python-Only Extensions

Features that go beyond R's CausalImpact. These have no R equivalent.

Feature	Method	What it does	Reference
Placebo test	`ci.run_placebo_test()`	Validates effect against null distribution from pre-period splits
Conformal inference	`ci.run_conformal_analysis()`	Distribution-free prediction intervals	Vovk et al. (2005)
DTW control selection	`select_controls()`	Automatic covariate selection via Dynamic Time Warping	Sakoe & Chiba (1978)
Horseshoe prior	`ModelOptions(prior_type='horseshoe')`	Continuous shrinkage alternative to spike-and-slab for dense DGP	Kohns & Bhattacharjee (2022), arXiv:2011.00938

API

`CausalImpact(data, pre_period, post_period, model_args=None, alpha=0.05)`

Parameter	Type	Description
`data`	`DataFrame` or `ndarray`	First column is the response variable, remaining columns are covariates
`pre_period`	`list[str \| int]`	`[start, end]` of the pre-intervention period
`post_period`	`list[str \| int]`	`[start, end]` of the post-intervention period
`model_args`	`dict` or `ModelOptions`	MCMC parameters (see below)
`alpha`	`float`	Significance level for credible intervals (default: 0.05)

Model Arguments

Key	Default	Description
`niter`	1000	Total MCMC iterations
`nwarmup`	500	Burn-in iterations to discard
`nchains`	1	Number of MCMC chains
`seed`	0	Random seed for reproducibility
`prior_level_sd`	0.01	Prior standard deviation for the local level
`standardize_data`	`True`	Standardize data before fitting
`expected_model_size`	2	Expected number of active covariates (spike-and-slab prior)
`nseasons`	`None`	Optional seasonal cycle count (R-compatible API)
`season_duration`	`None`	Optional duration of each seasonal block; defaults to `1` when `nseasons` is set
`dynamic_regression`	`False`	Enable time-varying regression coefficients (random-walk beta)
`state_model`	`"local_level"`	`"local_level"` or `"local_linear_trend"`
`prior_type`	`"spike_slab"`	`"spike_slab"` or `"horseshoe"` (continuous shrinkage for dense DGP)

Methods and Properties

Name	Returns	Description
`summary(output="summary")`	`str`	Tabular summary of causal effects
`report()`	`str`	Narrative interpretation of results
`plot(metrics=None)`	`Figure`	Matplotlib figure with original/pointwise/cumulative panels
`inferences`	`DataFrame`	Per-timestep actuals, predictions, prediction s.d., and effect intervals
`summary_stats`	`dict`	Aggregate statistics (effect mean, CI, p-value, etc.)
`posterior_inclusion_probs`	`ndarray \| None`	Posterior inclusion probability per covariate (spike-and-slab only)
`posterior_shrinkage`	`ndarray \| None`	Mean shrinkage factor per covariate (horseshoe only)
`run_placebo_test(...)`	`PlaceboTestResults`	Placebo test for effect validation
`run_conformal_analysis(...)`	`ConformalResults`	Distribution-free conformal prediction intervals

Benchmark Results

T	k	niter	This (Rust)	R (bsts)	vs R
100	0	1000	0.008s	0.213s	26x
500	0	1000	0.033s	0.997s	30x
1000	0	1000	0.069s	2.108s	31x
1000	5	1000	0.197s	2.171s	11x
5000	0	1000	0.330s	10.264s	31x

Median of 3 runs. Reproduce: python benchmarks/benchmark.py

Architecture

python/causal_impact/
    __init__.py          # Public API: CausalImpact, ModelOptions, __version__
    data.py              # DataProcessor: validation, standardization, period parsing
    main.py              # CausalImpact facade class
    options.py           # ModelOptions: typed MCMC configuration
    analysis.py          # CausalAnalysis: effect computation, CI, p-values
    summary.py           # SummaryFormatter: tabular and narrative reports
    plot.py              # Plotter: matplotlib visualization

src/ (Rust)
    lib.rs               # PyO3 entry point: run_gibbs_sampler()
    sampler.rs           # Gibbs sampler (R bsts-compatible algorithm)
    kalman.rs            # Kalman filter and simulation smoother
    state_space.rs       # State space model representation
    distributions.rs     # Posterior sampling distributions

Development

git config core.hooksPath .githooks

Running Tests

# All tests
uv run pytest tests/ -v

# Numerical equivalence only
uv run pytest tests/test_numerical_equivalence.py -v

# Rust tests
cargo test

Contributing

See CONTRIBUTING.md for development setup, PR workflow, and test requirements.

License

MIT