Announcing Topaz, an RPython powered Ruby interpreter
Hello everyone
Last week, Alex Gaynor announced the first public release of Topaz, a Ruby interpreter written in RPython. This is the culmination of a part-time effort over the past 10 months to provide a Ruby interpreter that implements enough interesting constructs in Ruby to show that the RPython toolchain can produce a Ruby implementation fast enough to beat what is out there.
Disclaimer
Obviously the implementation is very incomplete currently in terms of available standard library. We are working on getting it useable. If you want to try it, grab a nightly build.
We have run some benchmarks from the Ruby benchmark suite and the metatracing VMs experiment. The preliminary results are promising, but at this point we are missing so many method implementations that most benchmarks won't run yet. So instead of performance, I'm going to talk about the high-level structure of the implementation.
Architecture
Topaz interprets a custom bytecode set. The basics are similar to Smalltalk VMs, with bytecodes for loading and storing locals and instance variables, sending messages, and stack management. Some syntactical features of Ruby, such as defining classes and modules, literal regular expressions, hashes, ranges, etc also have their own bytecodes. The third kind of bytecodes are for control flow constructs in Ruby, such as loops, exception handling, break, continue, etc.
In trying to get from Ruby source code to bytecode, we found that the easiest way to support all of the Ruby syntax is to write a custom lexer and use an RPython port of PLY (fittingly called RPly) to create the parser from the Ruby yacc grammar.
The Topaz interpreter uses an ObjectSpace
(similar to how PyPy does
it), to interact with the Ruby world. The object space contains all
the logic for wrapping and interacting with Ruby objects from the
VM. It's __init__
method sets up the core classes, initial globals,
and creates the main thread (the only one right now, as we do not have
threading, yet).
Classes are mostly written in Python. We use ClassDef objects to define the Ruby hierarchy and attach RPython methods to Ruby via ClassDef decorators. These two points warrant a little explanation.
Hierarchies
All Ruby classes ultimately inherit from BasicObject
. However, most
objects are below Object
(which is a direct subclass of
BasicObject
). This includes objects of type Fixnum
, Float
,
Class
, and Module
, which may not need all of the facilities of
full objects most of the time.
Most VMs treat such objects specially, using tagged pointers to represent Fixnums, for example. Other VMs (for example from the SOM Family) don't. In the latter case, the implementation hierarchy matches the language hierarchy, which means that objects like Fixnum share a representation with all other objects (e.g. they have class pointers and some kind of instance variable storage).
In Topaz, implementation hierarchy and language hierarchy are
separate. The first is defined through the Python inheritance. The
other is defined through the ClassDef for each Python class, where the
appropriate Ruby superclass is chosen. The diagram below shows how the
implementation class W_FixnumObject
inherits directly from
W_RootObject
. Note that W_RootObject
doesn't have any attrs,
specifically no storage for instance variables and no map (for
determining the class - we'll get to that). These attributes are
instead defined on W_Object
, which is what most other implementation
classes inherit from. However, on the Ruby side, Fixnum correctly
inherits (via Numeric
and Integer
) from Object
.
This simple structural optimization gives a huge speed boost, but there are VMs out there that do not have it and suffer performance hits for it.
Decorators
Ruby methods can have symbols in its names that are not allowed as part of Python method names, for example !, ?, or =, so we cannot simply define Python methods and expose them to Ruby by the same name.
For defining the Ruby method name of a function, as well as argument number checking, Ruby type coercion and unwrapping of Ruby objects to their Python equivalents, we use decorators defined on ClassDef. When the ObjectSpace initializes, it builds all Ruby classes from their respective ClassDef objects. For each method in an implementation class that has a ClassDef decorator, a wrapper method is generated and exposed to Ruby. These wrappers define the name of the Ruby method, coerce Ruby arguments, and unwrap them for the Python method.
Here is a simple example:
@classdef.method("*", times="int")
def method_times(self, space, times):
return self.strategy.mul(space, self.str_storage, times)
This defines the method *
on the Ruby String class. When this is
called, the first argument is converted into a Ruby Fixnum object
using the appropriate coercion method, and then unwrapped into a plain
Python int and passed as argument to method_times
. The wrapper
method also supplies the space argument.
Object Structure
Ruby objects have dynamically defined instance variables and may change their class at any time in the program (a concept called singleton class in Ruby - it allows each object to have unique behaviour). To still efficiently access instance variables, you want to avoid dictionary lookups and let the JIT know about objects of the same class that have the same instance variables. Topaz, like PyPy (which got it from Self), implements instances using maps, which transforms dictionary lookups into array accesses. See the blog post for the details.
This is only a rough overview of the architecture. If you're interested, get in touch on #topaz.freenode.net, follow the Topaz Twitter account or contribute on GitHub.
Tim Felgentreff
Comments
Interesting. Although I code a lot in python but still quite like Ruby. Am looking forward for a fast ruby...
Does this mean that JVM is now obsolete?
Don't worry. JVM will outlive you and your grandgrandchildren.
"Its __init__ method", not "It's".