Inside SPy, part 2: Language semantics

This is the second post of the Inside SPy series. The first post was mostly about motivations and goals of SPy. This post will cover in more detail the semantics of SPy, including the parts which make it different from CPython.

We will talk about phases of execution, colors, redshifting, the very peculiar way SPy implements static typing, and we will start to dive into metaprogramming.

Before diving in, I want to express my gratitude to my employer, Anaconda, for giving me the opportunity to dedicate 100% of my time to this open-source project.

Motivation and goals, recap¶

Shameless plug: give SPy a star ⭐

I admit I never cared much about GitHub stars, but it looks like nowadays it's what you need to be considered important. We are at 678 stars at the moment of writing, let's try to get to 5000!

Part 1 describes motivations and goals in great detail, but let's do a quick recap.

The main motivation is to make Python faster; by "faster" I mean comparable to C, Rust and Go. After spending 20 years in this problem space, I am convinced that it's impossible to achieve such a goal without breaking compatibility.

The second motivation is that static typing is playing a more and more important role in the Python community, but Python is not a language designed for that, which leads to a suboptimal experience.

There are two possible definitions of SPy; both of them are accurate, although from very different perspectives:

SPy is an interpreter and a compiler for a statically typed variant of Python, with focus on performance.

and:

SPy is a thought experiment to determine how much dynamicity we can remove from Python while still feeling Pythonic.

The part about "interpreter and compiler" is fundamental: the interpreter is needed for ease of development and debugging, the compiler is needed for speed. The job of SPy is to ensure that the two pieces have the exact same semantics so that the compilation step is just a transparent speedup.

100% compatibility with Python is explicitly not a goal. The Zen of SPy contains the goals and design guidelines of SPy. This is a shortened version, see the link for full details:

Easy to use and implement.
We have an interpreter.
We have a compiler.
Static typing.
Performance matters.
Predictable performance.
Rich metaprogramming capabilities.
Zero cost abstractions.
Opt-in dynamism.
One language, two levels.

SPy version and playground

At the moment of writing, SPy is still changing very rapidly and it's very likely that some of the examples will break in the future. We don't have any official release yet, but all the following examples have been tried on SPy commit 229235b8.

All the examples have a Try it yourself button which opens the code snippet in the SPy Playground, a PyScript app to try SPy directly in the browser. The official SPy playground tracks the latest git main, while this blog post uses a custom version pinned to this exact commit.

Phases of execution and compilation pipeline¶

From the point of view of the user, SPy code runs in three distinct execution phases:

Import time: this is when we run all the module-level code, including global variable initializers, decorators, metaclasses, etc.. After this phase, all the globals are frozen.
Redshift: during this phase we apply partial evaluation to all expressions that are safe to be evaluated eagerly. This is an optional phase which happens only during compilation or when explicitly requested. The presence/absence of redshift should not have any visible effects on the behavior of the program.
Runtime: the actual execution of the program, starting from a main function.

The following is a diagram representing the compilation pipeline in the simplified case of a single .spy file:

graph TD

    subgraph FRONTEND["Import time"]
        SRC["*.spy source"]:::node
        AST["Untyped AST"]:::node
        SYMAST["Untyped AST + symtable"]:::node
        IMPORTLABEL(["import"]):::label

        SRC -- parse --> AST
        AST -- ScopeAnalyzer --> SYMAST
        SYMAST --- IMPORTLABEL
    end

    SPyVM["SPyVM"]:::node
    IMPORTLABEL --> SPyVM

    subgraph RS[" "]
        RSLABEL(["redshift"]):::label
        REDSHIFTED["Typed AST"]:::node
    end

    C["C Source (.c)"]:::node
    EXE_NAT["Native exe"]:::node
    EXE_WASI["WASI exe"]:::node
    EXE_EM["Emscripten exe"]:::node

    subgraph RT["Runtime"]
        INTERP(["interp"]):::label
        DOPPLER(["interp(doppler)"]):::label
        EXECUTE_NAT(["execute"]):::label
        EXECUTE_WASI(["execute"]):::label
        EXECUTE_EM(["execute"]):::label
    end

    OUT["Output"]:::node
    SPyVM --- INTERP --> OUT

    SPyVM --- RSLABEL --> REDSHIFTED
    REDSHIFTED -- cwrite --> C
    REDSHIFTED --- DOPPLER --> OUT

    C -- cc --> EXE_NAT --- EXECUTE_NAT --> OUT
    C -- cc --> EXE_WASI --- EXECUTE_WASI --> OUT
    C -- cc --> EXE_EM --- EXECUTE_EM --> OUT

    style FRONTEND fill:#fafaff,stroke:#9090cc,stroke-dasharray:5 5
    style RS fill:#fafaff,stroke:#9090cc,stroke-dasharray:5 5
    style RT fill:#fafaff,stroke:#9090cc,stroke-dasharray:5 5

    classDef node fill:#e8eaf6,stroke:#5c6bc0,color:#1a237e;
    classDef label fill:#fafaff,stroke:none,color:#1a237e;

The first steps up to and including ScopeAnalyzer are classical compiler stages. Contrarily to CPython, SPy doesn't produce bytecode. In SPy, executable code is kept in form of AST, which is then transformed during the various stages of the pipeline. SPy AST is used as the internal IR of both the compiler and the interpreter.

AST vs bytecode

Why using the AST instead of a bytecode as CPython does?

In the very early days of the project, SPy was bytecode based. Then PR #4 switched to the AST interpreter.

The main advantage is that it's simpler and easier to implement, especially redshifting; bytecode-based execution is more complex because you need both the bytecode compiler and the bytecode VM but tends to be faster to execute. This tradeoff makes a lot of sense for CPython, where bytecode is the only means of execution, but less for SPy where we also have a C backend.

That said, the long-term goal for SPy's interpreter is to have performance comparable to CPython.

The import step is interesting: it imports the given module and all of its dependencies in the running SPyVM instance. The dependencies are determined and resolved statically, by scanning for the presence of import statements, recursively. This means that all needed modules are imported eagerly, including modules which are imported only inside of function bodies (even if those functions are never executed). This is a big departure from CPython semantics, but it is essential to the design of SPy and enables many important features. We will talk more about it later in this series.

"Import time" is always executed by the interpreter. After that we can run the code in three different modes:

interpreted mode: the untyped AST is executed by the interpreter.
doppler mode: redshift transforms untyped ASTs into typed ASTs, which are then executed by the interpreter. This is mostly used by tests to ensure that the redshift pass produces correct code.
compiled mode: redshift transforms untyped ASTs into typed ASTs. Then we feed the typed AST to the C backend, which produces C code, which is finally compiled by gcc, clang or any other C compiler. SPy supports traditional native targets as well as the WebAssembly targets WASI and Emscripten.

Why C code and not LLVM?

At this stage we are trying to optimize for time to market. Emitting C code is much simpler, easier to develop and easier to debug, while still getting performance comparable to LLVM.

Moreover, by using C as the common ground we automatically have lots of great existing tools at our disposal, like debuggers, profilers, build systems, etc. And using C makes it very easy to target new platforms such as e.g. emscripten.

Hello world¶

Unlike in Python, the main entry point of a program is not module-level code, but it's the main function. This is needed because as we saw above, module level code is always executed "at compile time".

Thus, this is the hello world in SPy:

hello.spy

def main() -> None:
    print("Hello world!")

Motivation and goals, recap¶

Phases of execution and compilation pipeline¶

Hello world¶

Static typing¶

Operator dispatch¶

Static vs dynamic types¶

Redshifting¶

@blue functions¶

Type manipulation and generics¶

Operator dispatch, revisited.¶

Static typing as a special case of @blue evaluation¶

Next steps¶

`@blue` functions¶

Static typing as a special case of `@blue` evaluation¶