High Performance Python Extensions: Part 1
crumpington.comHere's how I would write this, in Cython using "pure C" arrays:
https://gist.github.com/syllog1sm/3dd24cc8b0ad925325e1
It's getting 18,000 steps/second, in the same ballpark as your C code.
I prefer to "write C in Cython", because I find it easier to read than the numpy code. This may be my bias, though --- I've been writing almost nothing but Cython for about two years now.
Btw, if anyone's interested, "cymem" is a small library I have on pip. It's used to tie memory to a Python object's lifetime. All it does is remember what addresses it gave out, and when your Pool is garbage collected, it frees the memory.
Edit: GH fork, with code to compile and run the Cython version: https://github.com/syllog1sm/python-numpy-c-extension-exampl... . I hacked his script quickly.
If I could have your permission, I'd like to incorporate this into a future post in the series. I can credit you in any way that you'd like.
That's fine, please link to http://honnibal.wordpress.com . I'll probably write a short note on it, I've been meaning to say more about "my way" of using Cython.
Edit: Submitted here. https://news.ycombinator.com/item?id=8483872
I think if you would use memory views you could have the benefits of both fast low level access, plus the vectorized numpy functions on the other hand (I'm thinking here of initializing the arrays with a single call to numpy.random.uniform). With multi-dimensional arrays it's definitely better than plain pointers.
Well, as you know, if I'm using a multi-dimensional array, it's usually super sparse! (Because NLP). So I want to define those myself, not use the numpy ones.
Maybe I just never learned numpy. But I had to go and look up what that stuff did, and it wasn't obvious to me what the data types of those arrays would be. So, I like the C-style initialization actually --- just because it's more obvious to me.
If I understand correctly, the `pypy` people strongly encourage the use of `cffi` instead of the CPython API, as the latter is tied too much to CPython and does not permit efficient JITing.
It's not so much about efficient JITing, and more about the fact that Pypy is written in Python and does not have a C-backend.
That said, I advocate CFFI even in CPython--CFFI is a clean way of calling C libraries without having to
- write interface code in C (CPython extensions) - write something that is not C or Python (Cython) - use a compiler or linker (CPython extensions and Cython)
I have personally written nontrivial CPython extensions, Cython extensions and numerous CFFI bindings, and I vastly prefer CFFI over any of the alternatives. With CFFI, you write your interface code in Python, and dynamically load C libraries at runtime with no compiler/linker required. It's fast and easy.
Plus, it's all pure Python and thus one source file works across Windows/OSX/Linux and CPython/Pypy without modification or compilation. I really can't say enough good things about CFFI!
PyPy people also happily encourage to just use python and not rewrite anything to C (or not rewrite most stuff). Those loops should be really really fast on PyPy btw, written in pure python (with numpy arrays)
Everybody should prefer cffi over the CPython API unless you are working some very specific corner cases or need deep integration with the Python object system. If nothing else the fact that you can write your C code without having to care about Python is a huge win. This was you can use the same C code in other projects or keep it around if you ever end up moving from python to some other language.
I recently cythonized the performance critical parts of a numpy/scipy based project with much success.
One does not necessarily need to get you hands dirty writing c-extensions ( although it can be a good exercise to learn the CPython API ).
In cython, you just need to sprinkle some static types to the inner loops and bump the speed up.
There are also other approaches like the HOPE jit https://github.com/cosmo-ethz/hope and Theano which is more about expression optimization and compilation: http://deeplearning.net/software/theano/
I wrote a blog post about HOPE http://blog.goranrakic.com/archives/2014/10/evaluating_hope....
Have you tried using ctypes? In particular ctypes.from_buffer is such a useful function.
https://docs.python.org/2/library/ctypes.html
Unfortunately it doesn't work in PyPy.
Part 2 of this series is posted here:
https://www.crumpington.com/blog/2014/10-21-high-performance....
Hacker News thread:
Why not Cython?
Cython is a code generation tool so it is often much harder to understand what the (intermediate) code is doing in more complex cases. This is especially true with numpy specific Cython code. And with Cython your debug story is more problematic. Cython is great but has some interesting performance corner cases, for example see: https://github.com/paulross/NotesOnCython
My preference is to work in C directly. That being said, I may look into a Cython implementation for a future post.
Cython lets you avoid a lot of C API boilerplate. The code to parse function arguments is quite distracting, and especially when it comes to managing reference counts it's error prone to do it manually.
With Cython you get to choose whether to work in a purely low-level style that maps directly to C code, or whether you want to mix in Python code. This lets you speed up existing code gradually.