Cython is 20

blog.behnel.de

210 points by geococcyxc 4 years ago · 70 comments

Reader

I love Cython. I really feel like it's the right balance of usability and allowing you to do what you want/need. Want to make your code a bit faster? Write Python with type annotations. Want to call a C library? Just import the header, and then use it from a function.

Pybind11 is also great, but quite different in aims - I feel like it's more like a project for C++ programmers wanting to expose functionality to Python.

jokoon 4 years ago

Any benchmarks for type annotations?
I already wrote a few patch for pysfml, which is written in cython, it was a bit awkward, and now I'm asking myself if cython is really the right tool to write bindings, compared to cpython, for example.
- physicsguy 4 years ago
  
  Gonna depend a huge amount on what you're doing to be honest. I used it for physics modelling codes and it made a bit of a difference (comparable to Numba) but dropping to C for the main computation routines was what we ended up doing, and that worked very well for us.
  It's very fast to write for, that's the main benefit. Use it together with profiling and just pick off the slowest part first.
- tristan957 4 years ago
  
  For what it's worth I wrote Python bindings using Cython for our open source C-API storage engine and the performance was fairly close to on par with C.
machinekob 4 years ago

Python with type annotations isnt faster using python runtime.
But some packages can utilize it for higher performance but most of the time it'll be slower cause you need to parse extra information if you want to reuse it in pure python.
- quietbritishjim 4 years ago
  
  It's true that if your Python code is being interpreted with CPython then adding type annotations won't make it any faster. But the comment said that if you're already compiling your Python code with Cython then adding type annotations will allow Cython to make your code a little faster.
- bobbylarrybobby 4 years ago
  
  You're thinking of CPython, the standard implementation of Python. Cython is a (barely) separate language that looks a lot like Python but gets compiled to something like C. When you need performance, you can drop down from (C)Python into Cython
- machinekob 4 years ago
  
  Mb im blind ofc Cython would be faster :P

erwincoumans 4 years ago

I would recommend considering using NanoBind, the follow up of PyBind11 by the same author (Wensel Jakob), and move as much performance critical code to C or C++. https://github.com/wjakob/nanobind

If you really care about performance called from Python, consider something like NVIDIA Warp (Preview). Warp jits and runs your code on CUDA or CPU. Although Warp targets physics simulation, geometry processing, and procedural animation, it can be used for other tasks as well. https://github.com/NVIDIA/warp

Google Jax is another option, jitting and vectorizing code for TPU, GPU or CPU. https://github.com/google/jax

logicchains 4 years ago

>I would recommend considering using NanoBind, the follow up of PyBind11 by the same author (Wensel Jakob), and move as much performance critical code to C or C++
Why would you recommend that? It's all way more effort than just writing Cython, especially in a Jupyter Notebook. And Cython code can be just as fast as C/C++ code unless you're doing something really fancy. It's a bunch of work for no benefit.
>Warp jits and runs your code on CUDA or CPU
If someone's writing Cython it's probably because they found something that couldn't be done efficiently in Numpy because it was sequential, not easily vectorisable. Such code is going to get zero benefit from Cuda or running on the GPU.
In general, all your jitted code is not going to be as fast as code compiled with an ahead-of-time compiler like the C compiler that Cython uses. Moreover if you use a JIT then it makes your code a pain in the ass to embed in a C/C++ application, unlike Cython code.
- wjakob 4 years ago
  
  > Why would you recommend that? [..] It's a bunch of work for no benefit.
  nanobind/pybind11 (co-)author here. The space of python bindings is extremely diverse and on the whole probably looks very different from your use case. nanobind/pybind11 target the 'really fancy' case you mention specifically for codebases that are "at home" in C++, but which want natural Pythonic bindings. There is near-zero overlap with Cython.
  - erwincoumans 4 years ago
    
    Yes, I assumed everyone who cares about performance (or who writes large programs) is also a C++ or CUDA programmer. Don't tell me that is not the case :-)
- erwincoumans 4 years ago
  
  Warp generates C/C++ code, that can be trivially used in a pure C++ or CUDA project without issues. So it is not strictly jit, since it calls the regular ahead-of-time compiler (gcc, llvm or nvcc) only when de code changes (using hashes to check for changes), so performance is good. Also, random non-vectorizable branchy code will run fine on cpu with Warp, but you loose many benefits indeed.
  Agreed, if you have bad performing spaghetti Python code, none of those tools are going to help indeed. Then I would rather rewrite it all in C/C++ instead of fiddling with Cython.
beltsazar 4 years ago

Or alternatively, PyO3 if you use Rust instead of C++: https://github.com/PyO3/pyo3
jcelerier 4 years ago

if the object you wanna bind fits into the mold of "an algorithm with inputs and outputs, and some helper methods" I've got automatic binding of a limited set of C++ features working in https://github.com/celtera/avendish ; for now I've been using pybind11 but I guess everything I need is supported by nanobind so maybe i'll do the port...

fermigier 4 years ago

I made an "Awesome Cython" page last year. I welcome pull requests (or you can fork it as you want):

https://github.com/sfermigier/awesome-cython

cb321 4 years ago

cython --annotate is an ok way to learn the whys & whereabouts of the rather hairy CPython API. That gives you an HTML page you can click on to expand your python code into equivalent-ish C API calls. Darker yellow means more calls, too. So, it's not a terrible start to do static analysis to guide optimization, but a combination score (with a run-time profile) would be even better.

I believe there was a time very early on (like 2003) when there was discussion about maybe including Pyrex in CPython proper to get a more Common-Lisp like gradually typed system. (I mostly recall some comment of Greg's along the lines of being intimidated by such. I'm not sure how seriously the idea was entertained by PyCore.)

pjmlp 4 years ago

While it is nice that this option is available, it would be much better if Python itself would embrance the necessary runtime capabilties to not have to rely on it.

fname11 4 years ago

This is not going to happen. GvR has successfully ignored Cython and PyPy for decades and has attached himself to a JIT project at Microsoft (has anything emerged?).
CPython is in the hands of not really productive bigcorp representatives who care about large legacy code bases. My guess is that CPython will be largely the same in 10 years, with the usual widely hyped initiatives that go nowhere ("need for speed etc.").
- linspace 4 years ago
  
  > who care about large legacy code bases
  It's clear that Python's main strength is its vast libraries, priority number one is not breaking them. If it could be possible to speed up Python without breaking changes I would be surprised precisely because with so much large codebases speed and efficiency would translate directly to money.
  - dikei 4 years ago
    
    Yeah, it took over a decade to switch from Py2 to Py3 due to the breaking changes it brought. I'd rather not to have such a large change again, ever.
  - poulpy123 4 years ago
    
    They really missed an opportunity when they made the switch from py2 to 3 to break things a bit more but give more improvement in exchange
    
    linspace 4 years ago
    
    Completely agree. I think GvR was too conservative. And yet he had lot of backslash, it's easy for me now to criticize several years later after the fact, I only have respect for the work done. I think it was in the mind of everyone the Perl 5/Perl 6 transition.
- olau 4 years ago
  
  While it's true speed was not a priority, I think most of those initiatives didn't try hard to work with upstream.
  The Microsoft funded project is different, they're merging things. I don't think they've started on a JIT translator yet, though, last time I looked they were busy picking lower-hanging fruit. From watching their communications, I think they might get there at some point.
  It's not as simple as just emitting machine code, though. To get something in the same magnitude of typical C code, you need to deduce types and peel away the boxing and unboxing layers.
f311a 4 years ago

To write fast Cython, you basically need to write in C and control everything including Python API calls. No runtime will help with this.
- dagw 4 years ago
  
  That is really only true if you want to squeeze every drop of performance out of Cython. For the first 80% of performance gains you don't have to go that deep.
  That is another thing that is nice about Cython, you don't have to learn all of Cython to be productive. Take your existing python function and just add some type annotations and you'll see real performance gains. Then you can profile your code and see what the next bottle neck is and fix that and so on.
  So, yes, Cython gives you the power to manually control the GIL and the Python API calls and manage your own memory management and layout for those corner cases where that is what you need. Most of the time you can happily ignore all of that and get almost all of the speedup available.
- throwaway894345 4 years ago
  
  The parent is talking about Python (specifically CPython, I assume), not Cython. Moreover, performance isn't a binary and there's lots of headroom for CPython to improve if they would be willing to drive the community through a breaking change (but the Python community generally assumes that any kind of breaking change necessarily has to look like the Python 2->3 transition and there's no political will for this). Note how many interpreted dynamic languages absolutely trounce CPython for anything that isn't a microbenchmark.
- pjmlp 4 years ago
  
  Common Lisp and simlar languages are a prof of what is possible.
  - dagw 4 years ago
    
    Writing a much faster language runtime for a language that looks quite a bit like Python is easy. The hard problem is writing a faster language runtime that is 100% compatible with all current python programs (and their extensions) out in the world.
    
    throwaway894345 4 years ago
    
    I don't think 100% compatibility should be a goal. The maintainers really need to step up and guide the community through a tough but important transition (reducing the API surface for C-extensions to some sane subset so the implementation can be optimized--see things like h.py) or Python will remain in its very slow local optimum while the world moves around it.
    Yes, we just went through one "tough but important" transition (Python 2->3) and it sucks to have to do this so often, but it's the price we pay from making bad bets (in this case, unnecessarily exposing the entire CPython interpreter as the C-extension API surface on the assumption that it will never be necessary to make CPython faster because people who need speed will just write the slow parts in C).
    
    coliveira 4 years ago
    
    But this is not necessary. The python developers just need to specify this fast subset of the language, and let people use to create libraries. Over time, we would have a growing set of libraries written in the fast subset.
    
    dagw 4 years ago
    
    People have. See PyPy and Pythran for two examples currently under active development. Instagram has such a project as well that they recently released. I know there have been others. None of them seem to catch on. It seems that most people don't actually want a faster subset of python. They want either all of python or none of python (by switching to another language all together)
    
    coliveira 4 years ago
    
    Pypy is not really practical. It requires you to stop using many features of Python. What Python needs is a fast subset that is supported within the main implementation.
    
    marky1991 4 years ago
    
    What do you mean? Pypy supports all of python. (not necessarily all of the C API, but that's somewhat a WIP and somewhat a result of the overly-expansive C API.)
    
    mumblemumble 4 years ago
    
    I'm not and have never been a game developer, but I think that a decent analogy here might be how many game studios write the core engine in C++, and then do a lot of the high level game logic and scripting in an interpreted language such as Lua or their own dialect of lisp.
    I would guess that there's a clear separation of responsibilities, and each of the two languages is very well-suited to what it's being used for. There's not really a whole lot of anxiety about getting Lua (or whatever) to pull out all the stops you see in a compiler like SBCL or interpreter like V8, because these communities were never looking for a single language that could cover all uses cases in the first place. To steal an analogy I used the other day from myself, I'm guessing they don't want a spork all that badly because they're plenty happy with using a fork and a spoon.
    That's how the community of people doing scientific computing and suchlike in Python tends to feel about things, too.
    
    travisoliphant 4 years ago
    
    This is related to the idea of EPython that we are working on (as we have funding): https://github.com/epython-dev/epython
    It currently emits Cython for the C-backend (and PyIodide). It is very alpha currently, but if people are interested in helping, get in touch.
  - johnisgood 4 years ago
    
    People are starting to forget non-trendy languages only to reinvent the wheel or resolve non-issues. It is not strictly related to languages. Such a common phenomena based on my observations. I did not like history of IT, but now I believe it is necessary, at least for your own sub-field or something.
chaxor 4 years ago

Don't count on it - switch to Julia instead.
- physicsguy 4 years ago
  
  Julia is great if you can afford to spend 5 minutes sitting around for your session to load, but most people have things to do.
  - ninjin 4 years ago
    
    Julia has its own heap of issues, but five minutes is a load of bull:
    > time julia -e 'using Plots; plot(rand(10, 5), rand(10))' ________________________________________________________ Executed in 5.77 secs fish external usr time 5.74 secs 214.00 micros 5.74 secs sys time 0.57 secs 0.00 micros 0.57 secs
    This is also on a fairly old version at that:
    > julia -v julia version 1.6.3
    Regardless, this conversation was to be about Cython; it which deserves a lot of praise in its own right as a tool in your toolbox to make the blasted snake run faster.
    
    adgjlsfhk1 4 years ago
    
    Also, if you make a sysimage, startup time can be reduced to <1s. This doesn't get a ton of publicity because many of the more active Julia devs are developing lots of packages and/or developing Julia which makes this less applicable to them, but if you are waiting more than a few seconds to load packages that you aren't a developer of, sysimages are a wonderful quality of life improvement.
  - amkkma 4 years ago
    
    When's the last time you've tried it? Load times are way better. New improvements for code caching have been merged in 1.8, and native code caching is coming in 1.9. It's a complex problem due to Julia's aggressive specialization and composability, so you can't just expect Julia to do it like other compiled languages.
    Check out this recent writeup from a core dev: https://discourse.julialang.org/t/precompile-why/78770/8
  - pjmlp 4 years ago
    
    How many times a day are you starting sessions?
    
    physicsguy 4 years ago
    
    I work in Python, but I might restart, create new sessions, etc anywhere between 1-100 times a day depending on what I am doing.
    
    bigbillheck 4 years ago
    
    I don't use julia, but I start python sessions multiple times per day.
- machinekob 4 years ago
  
  Would love to but community is way to small for my use case :(
  - adgjlsfhk1 4 years ago
    
    what's your use case?
- pjmlp 4 years ago
  
  Yeah, looking forward for Julia to pressure Python.

nickmain 4 years ago

Mypyc is an alternative tool for speeding up type-annotated Python code. It doesn't help with calling existing C code, unfortunately.

[0] https://mypyc.readthedocs.io/en/latest/

victoryhb 4 years ago

I tried to learn Cython last year, but was thwarted by two issues: (1) its syntax was too ugly for my taste and support for the pure Python mode was immature; (2) performance bottlenecks were opaque and hard to profile (at least for beginners). I ended up picking up Nim, a language with Python-like syntax and C-like performance, and was productive within hours (literally). I never looked back.

bminusl 4 years ago

Maybe you will also be interested in Cython+: "Multi-core concurrent programming in Python" [0].

[0]: https://www.cython.plus/en/

Bostonian 4 years ago

Is the 2015 O'Reilly book on Cython by Kurt Smith still a good starting point to learn about it, or is it outdated?

c-fe 4 years ago

I have heard about Cython before but I have never actually used it. I have however used Numpy, Scipy and Numba. Are there any reasons to also consider Cython in combination with those other libraries? E.g. in which cases would Cython be considerably better than Numpy or Numba? My workload consists mostly of data science and statistics, running models and simulations.

dagw 4 years ago

Cython works great in conjunction with Numpy arrays and you can easily call numpy and scipy methods from within Cython. The big win comes when you have to do some operation to a numpy array that doesn't have a 'fast' path within numpy. If you ever find yourself in a situation where you have to loop over or apply any sort of custom operation to every element in a numpy array then Cython can be a huge win, especially since Cython also makes it possible to parallelise those loops.
The other place it shines is if you ever need to loop over an array of data that cannot easily be represented as numpy arrays, like strings or more complex structs. Here you can get significant speedups compared to python.
The third use of Cython I really like is with C and C++ interop. Sure there are lots of ways of calling C code from Python, but to me Cython is probably the quickest and cleanest.
Compared to Numba, it's harder to say. Numba, when it works, is easily as fast as Cython. However I find Numba hard to reason about and it's still a bit of a black box as to when and why it does and doesn't work. The nice thing about Cython is that it is pretty simple so you can easily reason about what it will do your code and how it will perform. It's been a long time since Cython 'surprised' me by performing much better or worse than I expected.
If you want to see Cython in action, take a look at the source code of scikit-image or scikit-learn. They implement many of their core algorithms in Cython
rich_sasha 4 years ago

Numpy and Scipy do the heavy lifting in fast compiled C / Fortran, but if you write a for-loop doing these things, it will still be (comparatively) slow.
Numba is a JIT, and only covers some of Numpy. I'd say it's amazing at how well it works, but it "only" covers certain aspects of the language. It's also a bit of an all-or-nothing - if it doesn't cover a certain class of syntax, it just won't JIT.
Cython is ahead-of-time compiled, and much more comprehensive. It turns Python, effectively, into C, and compiles it as a Python extension. The possible scope is thus much greater, and although Cython comes with built-in support for Numpy, it is much more broad in principle.
So... it's a very different set of trade-offs. Like with Numba, out of the box, with no changes, you will typically see a significant improvement (what's significant? From experience about 2x). You have much more scope for tweaking your code to speed things up - move some of the execution to C, disable bounds checking, outright call C libraries, etc. It comes with a suite of tools for analysing performance bottlenecks. It used to come with a lot of special syntax, which nowadays is done with annotations and decorators - much neater IMO. And of course, no run-time compilation delay, it's moved to, well, compilation time.
nicoco 4 years ago

Numba is better in my opinion for the use case you describe, less hassle.
However, (I think) cython is superior when:
- you want to distribute (eg as a pypi package) your code
- you want to interface with C/C++ code libs
I found out I almost never have to do this and did not touch cython since I started using numba.

gotaquestion 4 years ago

I like cython, but I think it pigeon-holes developers: i've seen hardware modules written in Cython, when they could easily switch to C++ and provide a library that could be used as an FFI in any language, but instead locked themselves (and users) into the narrow world of Python.

Is there a way to convert a Cython module(s) to C++, or at least a .o file? They are so dang close.

ok123456 4 years ago

https://github.com/Nuitka/Nuitka

blindseer 4 years ago

I wish Cython was a more popular option than choosing Julia or Go. Cython is great and you can get some real performance out of it.

The only drawback is that a Cython module still loads the CPython interpreter, so I personally prefer writing performance critical code in Rust instead. Writing in Julia has the same drawbacks of not being embeddable that writing in Cython does.

Julia has multiple dispatch and may seem more appealing but at scale it is a very slow language to develop in. And for scripts it takes FOREVER (try loading Plots, CSV, DataFrames, Makie etc every time you restart. It’s genuinely insane that that’s the norm.)

If the whole Python ecosystem was in Cython (i.e. numpy, scipy, etc) I’d never use another backend language again.

jokoon 4 years ago

Question: I found python "bindings" for SFML, written in cython, and patched them a bit.

I guess Cython is not really made to write bindings, but is it easier to write bindings with cython or cpython?

Galanwe 4 years ago

As someone who has been writing python bindings regularly for 10 years:
Writing bindings in Cython is much, much faster in terms of development time. It fits nicely and unintrusively in an already python packaged library. You can gently add some C functions or call C libraries in minutes.
You won't have a full control of what's happening though. Just have a look at the generated code and you'll see the mess of indirections that are generated.
Cython bindings become limited when you have to build more complex stuff though, going deeper than just calling some C functions. The typical case is when you have to actually handle the lifetime and borrowing of C native objects.
At that point, CPython will be the way to go, but it's much more code, and very error prone: you have to manually keep track of reference counting.

kubb 4 years ago

and i still have no idea what i could use it for...

dagw 4 years ago

It's great for speeding up 'hot' functions in your python code and makes it easy to call C libraries from python.
nurbl 4 years ago

It's easy to drop in Cython in an existing project where you need some performance, and start gradually "cythonizing" modules from the inside out. The rest of the code does not need to care.
With a bit of care (and benchmarking) you can get very respectable speed. The main drawback is that the further you go, the more C knowledge you need in order to not blast your own feet off.
If you're just after a bit more performance in general, a drop in solution like pypy might be enough.
ergo14 4 years ago

We speed up our ML code 40 times using it.
- DaedPsyker 4 years ago
  
  Would that be in the data loading that you are getting the most benefit?
  I'm curious, since most of the big libraries are already just cuda calls anyway but I'm always interested in anything to speed up the full process.
  - microtonal 4 years ago
    
    I can't speak for the parent commenter, but there is often code processing the input/output of machine learning models that benefits from high-performance implementations. To give two examples:
    1. We recently implemented an edit tree lemmatizer for spaCy. The machine learning model predicts labels that map to edit trees. However, in order to lemmatize tokens, the trees need to be applied. I implemented all the tree wrangling in Cython to speed up processing and save memory (trees are encoded as compact C unions):
    https://github.com/explosion/spaCy/blob/master/spacy/pipelin...
    2. I am working on a biaffine parser for spaCy. Most implementations of biaffine parsing use a Python implementation of MST decoding, which is unfortunately quite slow. Some people have reported that decoding dominates parsing time (rather than applying an expensive transformer + biaffine layer). I have implemented MST decoding in Cython and it barely shows up in profiles:
    https://github.com/explosion/spacy-experimental/blob/master/...
  - ergo14 4 years ago
    
    In this case was multicore computation without GIL if i remember correctly.
hansor 4 years ago

We had to parse dozeon of 20GB files daily with super complex structure and not in linear structure. With Cython (finally we migrated to Pypy) we gained around 20-60x speedup.
baq 4 years ago

it's C with Python syntax and syntactic sugar for Python objects on C level, including refcounting, which is the hard part.
if you successfully use numba, probably nothing that you couldn't already do.
if you want something that lives much closer to C, it's perfect.

Settings

Cython is 20

Keyboard Shortcuts