The first stable release of PyPy3

morepypy.blogspot.com

299 points by pjenvey 12 years ago · 91 comments

Reader

Buetol 12 years ago

Wow, this is a very exciting moment for the Python world.

And they didn't even reached their funding goal for "py3k in pypy" [1]. This is dedication. I encourage everyone to fund this extremely incredible project!

[1]: http://pypy.org/py3donate.html

chrismonsanto 12 years ago

I have been checking the py3k branch on Hg every other day waiting for this moment, what a pleasant surprise. Very, very exciting. Thanks all.
I donated a while back, will make another donation soon.
I would like to start using this immediately but I think I'll have to wait until a 3.3 release for "yield from".

rectangletangle 12 years ago

Awesome, I hadn't realized this project was quite this far along. If they get PyPy 3.4/3.5 going with NumPy, it will make a really nice package. Fast Python code for the high-level logic, paired with fast low-level number crunching. This could also help speed up the adoption of Python 3.

rch 12 years ago

Looks like they're over 80% of the way to hitting the funding goal for that one too:
http://pypy.org/numpydonate.html
- ma2rten 12 years ago
  
  The problem is: even if numpy gets ported we still don't have scipy and a million other packages which require C bindings.
  - rectangletangle 12 years ago
    
    True. Though this will likely make PyPy more mainstream, and thus it'll hopefully attain more community support.

thomasahle 12 years ago

I wish the community would just switch entirely to pypy. Being able to just slightly performance sensitive code in python is a huge win.

ngoldbaum 12 years ago

It makes sense to use pypy if you're writing pure python code. The second you need a C extension, you're pretty much out of luck. This kills a lot of the appeal for people in the scientific/analytics side of things, who make heavy use of legacy C and Fortran routines.
- dragonwriter 12 years ago
  
  > The second you need a C extension, you're pretty much out of luck.
  In theory, shouldn't CFFI be the foundation of the solution to that problem?
  - tych0 12 years ago
    
    In practice it works pretty well. I am nearing completion of a rewrite of X's XCB-based python bindings in cffi, and it has worked out quite nicely.
  - rguillebert 12 years ago
    
    yes
- rguillebert 12 years ago
  
  You can use these C and Fortran routines on PyPy, just not with the CPython C extension API.
  - takluyver 12 years ago
    
    There's a lot of established code built around the C API. It's not like it can just be rewritten using CFFI over a weekend.
    
    rguillebert 12 years ago
    
    Well, I'm not talking about every C extension in the world, I'm talking about thin bindings around C routines which is what numpy has for fft for example, someone wrote a basic equivalent for numpypy in a few hours with no prior numpypy/cffi knowledge.
    
    onalark 12 years ago
    
    Unfortunately, NumPy uses deep knowledge of the CPython API in quite a few places, which is one of the reasons implementing NumPyPy has been so challenging.
    
    rguillebert 12 years ago
    
    Another approach should be used for those, but to be clear I wasn't talking about the entire numpy library, I'm talking about things like numpy.fft
- sitkack 12 years ago
  
  No, you are wrong. It supports both ctypes and cffi, both of which should be the goto for calling native code. Use PyObject has been the stupid choice for over 4 years.
masklinn 12 years ago

> I wish the community would just switch entirely to pypy.
What purpose would that serve?
> Being able to just slightly performance sensitive code in python is a huge win.
I think you slightly this phrase, but aside from that pypy does not work for everybody and everything (e.g. at best it's no slower for sphinx, it really doesn't like the way docutils works). It's not like pypy's a magic wand.
- quacker 12 years ago
  
  > What purpose would that serve?
  If PyPy became the official/canonical implementation, PyPy would receive more attention and third-party library compatibility would be a requirement. Complaints about Python's slowness would be somewhat less relevant, and Python might see wider adoption. The RPython toolchain would receive more attention and that could be useful to other languages. There are plenty of reasons, but PyPy is usually a free speedup for your Python application. Who's going to complain about that?
  > pypy does not work for everybody and everything
  True, but as the official implementation of Python, compatibility with PyPy would then be a must, and this situation would be greatly improved.
  - BuckRogers 12 years ago
    
    I agree with you, but it will never happen. GvR wants a as-simple-as-possible reference implementation, for one, he has to maintain it with a volunteer dev team. Also, there's a split in the Python community between guys like me and you- and the scientific squad. Until the scientific stuff works 100% in PyPy you'd lose a significant portion of the Python userbase by dumping CPython.
    GvR has done enough damage to Python with Python3. I don't intend to encourage him to do make any more changes. Us Python web developers are better off using what we have (non reference implementations, which don't hurt anyone), or just use Node.js.
    
    michh 12 years ago
    
    I don't think it's fair to put the blame of the unfortunate way things have gone with Python 3 solely on the shoulders of GvR. Afaik, a huge part of the community felt this was the way to go. Unfortunely, it wasn't.
    
    pekk 12 years ago
    
    Killing Python 3 isn't something a majority of the community wants and it isn't objectively better either.
  - Fede_V 12 years ago
    
    I agree entirely. What's kind of a pity is that until NumPy is ported over, all of the scientific stack is basically unusable on PyPy - and right now, there are several incredibly good NumPy specific JITs (numexpr, numba, parakeet).
jamespo 12 years ago

Maybe moving libraries & code to Python 3 should be the priority.
- cookiecaper 12 years ago
  
  Convincing distros to package it as the default "python" should be the priority. Until that happens, Python 3 will see limited adoption. The path of least resistance will always have the most traffic.
  - _delirium 12 years ago
    
    Ubuntu seems to have that as a near-term goal: https://wiki.ubuntu.com/Python/3
  - keeperofdakeys 12 years ago
    
    The first step is to get everything python3 compatible, and have it use a hashbang or other mechanism to select the right interpreter. After this happens, the default interpreter has no real meaning: everything will use the right interpreter.
  - cwyers 12 years ago
    
    It's a chicken and egg problem. So long as most Python libraries run on Python 2 but not Python 3, distros are going to package Python 2.
    
    cookiecaper 12 years ago
    
    Someone has to take the first step and break the cycle to get the chicken-egg problem undone. Arch has had Python 3 as the default Python interpreter for a couple of years now and it's been working pretty much fine. Many libraries now support Python 3, and I've done full sites in Py 3. Almost all Python scripts I write these days are Python 3. I don't think I've had to downgrade a script that started in 3 down to 2 for a couple of years now. It's as ready as it's going to get.
    The groundwork is done, and I think everyone who is going to support Py 3 without any extra prodding has already done so. Now we need the distros to come through and give that extra nudge to the maintainers that are still slacking, or encourage people to replace those libraries that refuse to update.
    
    Luyt 12 years ago
    
    Actually, the majority of PyPI packages are python-3 compatible. For a status overview, see http://python3wos.appspot.com/
    
    briancurtin 12 years ago
    
    That's not what that site is saying.
    For my PyCon Russia talk, I pulled down the data for all 44,402 packages (as of May 31). 13.5% of all packages on PyPI support some version of Python 3. 75.5% of the top 200 packages by download count claim to support some Python 3 version (according to their setup.py classifiers). Additionally, 64% of the top 500 support some Python 3 version.
    Another interesting thing I saw was that of those 44K packages, 44% of them have seen a release within the last 12 months (representing 82% of the last month's download share), and 22% of those packages released in the last year support some version of Python 3.
ris 12 years ago

How much memory do you have?
- dagw 12 years ago
  
  For a sufficient performance increase? As much as it takes. Memory is cheap
  - ris 12 years ago
    
    On servers and especially on virtualized servers it is absolutely not.

tedunangst 12 years ago

Minor note: the openbsd support (at least for 2.x) is amd64 only. Building for i386 at some point requires running a bootstrap process that doesn't fit in memory.

hcarvalhoalves 12 years ago

> Building for i386 at some point requires running a bootstrap process that doesn't fit in memory.
Seriously, it takes more than 4gigs to build PyPy? Is that also necessary for other platforms besides OpenBSD?
- thristian 12 years ago
  
  When you're compiling CPython, it's neatly broken into little bite-sized chunks (.c files), each of which has all the type-annotations and such that the compiler needs to produce efficient code.
  When you're compiling PyPy, it basically has to load the entire Python interpreter structure into memory so it can do its various analyses and annotations, so compiling PyPy takes a long time. I think for a while it was excluded from certain Linux distros because their package-build-farm machines wouldn't handle it.
- keeperofdakeys 12 years ago
  
  http://stackoverflow.com/questions/8452396/does-pypy-transla...
  Pypy is written in RPython, a subset of the python language. When it's 'compiled', the pypy RPython code runs in cpython or pypy, to re-compile the pypy source into C code, to generate a binary. Lots of tuning and such occurs at the same time, so the JIT runs well on the target machine. This is why it takes a long while, and lots of memory.
  The build also prints a fractal while compiling. http://pypy.readthedocs.org/en/latest/faq.html#why-does-pypy...
- tedunangst 12 years ago
  
  I think it's 2 and some change, but yeah. I don't know the specifics. Once bootstrapped, it's more reasonable, but building from source is pretty wicked.
- sitkack 12 years ago
  
  4GB is literally nothing. My laptop has 16, most servers I use have 128+. 4GB is netbook territory.
  - tekacs 12 years ago
    
    ... I think the implication is that more than 4GB would exceed the pre-[PAE][1] memory limit[2]. A form of cross-compilation might work, though PyPy build isn't exactly a simple, 'classical' build process. :P
    Edit: also, looking at your comments[3E] it looks like surely you know this (sorry) so I'm now really not sure what you're getting at... :P
    [1]: http://en.wikipedia.org/wiki/Physical_Address_Extension
    [2]: and even with PAE you still need to split into multiple processes/address spaces to do anything useful
    [3E]: https://news.ycombinator.com/threads?id=sitkack
    
    sitkack 12 years ago
    
    My point is requiring a lot of ram for a build is not a problem. Yes it would be nice to support low end devices for PyPy compilation, but the set of people on extremely constrained hardware and those people doing development on PyPy that would need to build from source is well, by definition zero.
    32 bit is dead except for ARM, and it will be dead on ARM in 4 years.
    
    tekacs 12 years ago
    
    > 32 bit is dead except for ARM, and it will be dead on ARM in 4 years.
    Uh... sure? ... but the parent post was about how building for 32 bit _today_ simply does not work and will not work.
    Whilst it's not necessarily best to build for technology almost gone, there definitely will continue to exist 32 bit devices that people would expect to run Python on for quite a number of years yet - today's 32 bit ARM chips aren't going anywhere awhile and not every form factor (say non-desktop) is well suited to a 64+-bit architecture. :/
    
    sitkack 12 years ago
    
    Remember we are talking about _building_, actually JITing a JIT using a dynamic language _for_ a dynamic language.
    I haven't run a 32 bit desktop or server system since 2004. 32 bit is quite dead. In 4 years, only the cheapest ARM SoCs will be 32 bits. In embedded devices, yes 32 bits will be around for a great long while.
  - cookiecaper 12 years ago
    
    4GB is not literally nothing, it's 25% of the memory available on your laptop. That's a significant chunk.
  - girvo 12 years ago
    
    That's not true. $1000 ultrabooks often have 4GB. Hell, the base model rMBP has 4GB (I paid the extra for 8GB).
    
    sitkack 12 years ago
    
    The people surfing the web and buying some music on iTunes are not building PyPy from source. It makes no sense to put the engineering work into supporting such memory constrained dev environments.

zyngaro 12 years ago

I've just made a small donation.

wldcordeiro 12 years ago

This is awesome, now just to wait for a Python 3.4 PyPy release :D

Derbasti 12 years ago

And Numpy! And ctypes (for Matplotlib)!
Although I must say, numpypy is quite usable already!
- sitkack 12 years ago
  
  PyPy has had ctypes support for a great long while.
  - Derbasti 12 years ago
    
    True. Not complete enoughbfor Matplotlib, though.
    
    rguillebert 12 years ago
    
    I think you're talking about the c extension api.

johnrds 12 years ago

I created a simple Terminal instance that compares Python and PyPy in a performance test:

https://terminal.com/tiny/shkhWWkcEV

(this lets you compare the performance on a real Linux system, without installing anything)

mineo 12 years ago

The PyPy people themselves have a benchmark portal at http://speed.pypy.org/ with graphs and everything.
codiator 12 years ago

PyPy seems to be 7x faster!
- hyperbovine 12 years ago
  
  On a silly piece of code that nobody would ever have any use for. I have tried PyPy for "real" data and numerical tasks from time to time, and never have I noticed any sort of speedup. Usually it's slower than CPython. Perhaps this latest version will be different, who knows.
  - apendleton 12 years ago
    
    I'm using it in production, and speedups tend to be on the order of 4-5x for my app (the compute-intensive part involves hierarchical agglomerative clustering of documents by text similarity, so it's data/numbers-heavy). Obviously it'll depend on your individual application (and non-CPU-bound tasks won't benefit much), but we switched to PyPy because it showed major improvements in profiling of our app on production data (and we switched around PyPy's 1.9 release, so it's even better now). It's not like everyone's just imagining the speed improvements...
    
    IanOzsvald 12 years ago
    
    I've just finished writing "High Performance Python" for O'Reilly (due August), we have a chapter on Lessons from the Field and one chap talks about his successful many-machine roll out of a complex production system using PyPy for a 2* overall speed gain. We also cover Numba, Cython, profiling, numpy etc - all the topics you'd expect.
    
    illumen 12 years ago
    
    It's not like everyone's just imagining that it's slower for many work loads either.
    
    apendleton 12 years ago
    
    Not disagreeing, but they implied that this benchmark only showed a speed improvement because it's a toy, and that real workloads with real data are usually slower. That hasn't been the case in my experience.
  - rguillebert 12 years ago
    
    Help us make your code faster, report it please :)
  - wolf550e 12 years ago
    
    You do remember that it's a jit and the first run is not fast? You have to let it run for a while to generate fast code and only benchmark after that.
  - pekk 12 years ago
    
    You might try again since things have changed. If you don't get any kind of speedup, the PyPy project would likely consider it a bug and it would be helpful to document that it was slower. Please consider finding some way of reporting the specific measurable issues you find!

chris_mahan 12 years ago

Excellent. I've been waiting for this for a long time.

voidlogic 12 years ago

How does the performance of PyPy and Jython compare?

pipeep 12 years ago

According to Jython's (a little dated) FAQ <https://wiki.python.org/jython/JythonFaq/GeneralInfo>, "Jython is approximately as fast as CPython--sometimes faster, sometimes slower. Because most JVMs--certainly the fastest ones--do long running, hot code will run faster over time."
PyPy aims to be (and is in many cases) faster than CPython.
The advantage with Jython isn't a performance one: it's the ability to call Java code directly.
rguillebert 12 years ago

Jython is usually slower than CPython I believe, it has no GIL though.
- rdtsc 12 years ago
  
  Wonder if it can be faster under higher parallelism conditions. Multiple threads doing some CPU intensive work?
  - sitkack 12 years ago
    
    Jython can utilize threads as well as Java can, so on many core machine Jython wins by a pretty large margin.

husio 12 years ago

Thank you.

derengel 12 years ago

I don't know or use Python but why an implementation that is trying to be "superior" still has the GIL?

huxley 12 years ago

This donation page has some background on how PyPy is proposing to replace the GIL with software transactional memory:
http://pypy.org/tmdonate2.html#introduction
pekk 12 years ago

What DO you know or use? Did you think that the GIL was an obvious and stupid oversight made by stupid people for no good reason?
- glibgil 12 years ago
  
  Obviously the GIL was shortsighted, yes. Leave the people out of it. The idea was stupid. There was a reason, but it wasn't a good reason.
  - rguillebert 12 years ago
    
    What would you replace it with ?
    
    glibgil 12 years ago
    
    No GIL.
    
    rspeer 12 years ago
    
    Your username is apt, but novelty accounts aren't a thing here. What exactly are you hoping to communicate?
    
    glibgil 12 years ago
    
    My name is Gil. Look at my comment history and apologize.
    
    rguillebert 12 years ago
    
    you need something to allow concurrent access to internal interpreter data structures...
    
    glibgil 12 years ago
    
    STM or MVar
meowface 12 years ago

Because it's very tricky to remove.
Ruby also has a GIL.
- dragonwriter 12 years ago
  
  > Ruby also has a GIL.
  MRI has a GIL; major alternative implementations (JRuby, Rubinius) do not.
  OTOH, addressing the downsides of a GIL are not the only reasonable motivations for an alternative implementation, so there's no reason that a better-than-stock Python (or Ruby) fundamentally must remove the GIL (the current "MRI" used to be an alternative, YARV, to the old MRI, and both had GILs.)
  - meowface 12 years ago
    
    True. Jython and IronPython also do not have a GIL.

Settings

The first stable release of PyPy3

Keyboard Shortcuts