PyPy is a Python interpreter written in Python. It claims to be faster than CPython for certain benchmark tests.
How can Python — not particularly known for its speed — interpret Python source code faster than an interpreter written in C?
Also, an interpreter written in Python sounds like a neat exercise, but what’s the point of doing this?
How can PyPy be faster?
How can a Python interpret Python source code faster than an interpreter written in C?
The short answer is it can’t. The long answer is that PyPy is actually two pieces of software:
- an interpreter
- a translator
The interpreter understands the full Python language. The interpreter is written in RPython (Restricted Python). RPython is statically typed which makes it easier to compile into more efficient code. This characteristic is very important for the next step.
The translator takes RPython code as its input and translates it to a lower language like C. Although the interpreter was originally written in Python (specifically RPython), the translator translates it to C which can be compiled into a much faster interpreter that could rival or even surpass CPython.
What’s the point of doing this in RPython?
Why not just code it in C like CPython?
The difference boils down to the perceived difference between hacking away at an interpreter written in RPython (PyPy) versus an interpreter written in C (CPython). The claim is that an interpreter coded in RPython allows for faster experimentation.
Can it rival CPython?
Can a Python interpreter based on RPython code translated into C rival or exceed the performance of CPython?
At this moment (April 7, 2010), the answer is no. Default CPython bests its PyPy-C equivalent on just about every benchmark. On some benchmarks, the measurements are close. On others, CPython is clearly the winner by far.
This is false advertising!
How can PyPy claim to be faster?
…especially if CPython does better on most benchmarks.
The answer is JIT! The developers of PyPy created a JIT compiler for Python by adding a few hints to their RPython interpreter. The PyPy JIT compiler just-in-time compiles RPython code to native code which is why it’s able to get such fantastic speed gains when compared to CPython. PyPy-C-JIT crushes CPython on the majority of the speed tests at the PyPy Speed Center.
That doesn’t seem fair… Of course native code will be faster than interpreted code.
What if CPython is coupled with JITting?
Turns out PyPy still comes out ahead. CPython can be coupled with Psyco, a Python extension module that can JIT compile Python code. The Psyco JIT approach differs from the PyPy approach, because the former focuses on partial evaluation and the latter on tracing. I don’t know the difference yet (maybe another blog post?), but what we do know is this tracing JIT approach outperforms the Psyco approach.
So there it is… PyPy in a nutshell.
UPDATE: It seems I’m not the only one interested in PyPy’s claims. I came across a related thread on Stackoverflow: PyPy – How can it possibly beat CPython?
DISCLAIMER: I have not used PyPy — just very curious about the problems it solves.