Recurrence-Duplication: Deterministic Parallelisation of Non-Affine Scalar Loops
deviantabstraction.comTL;DR A loop that carries any pure scalar state can be strip-mined across p threads by having each thread privately replay ≤ p(p-1)/2 “warm-up” updates before its first public iteration. No closed-form skip-ahead, no speculation, and a few extra machine instructions in code-gen.