Settings

Theme

Garbage Collection in Ruby 2.1

tmm1.net

163 points by tmm1 12 years ago · 30 comments

Reader

AlexanderDhoore 12 years ago

It's interesting to compare this to how python does it (reference counting with generations): http://patshaughnessy.net/2013/10/30/generational-gc-in-pyth...

"At first glance, Ruby and Python seem to implement garbage collection very differently. Ruby uses John McCarthy’s original mark and sweep algorithm, while Python uses reference counting. But when we look more closely, we see that Python uses bits of the mark and sweep idea to handle cyclic references, and that both Ruby and Python use generational garbage collection in similar ways. Python uses three separate generations, while Ruby 2.1 uses two.

This similarity should not be a surprise. Both languages are using computer science research that was done decades ago – before either Ruby or Python were even invented. I find it fascinating that when you look “under the hood” at different programming languages, you often find the same fundamental ideas and algorithms are used by all of them. Modern programming languages owe a great deal to the ground breaking computer science research that John McCarthy and his contemporaries did back in the 1960s and 1970s."

  • oleganza 12 years ago

    I think Henry Baker had a paper on why generational and refcounting GCs are ultimately equivalent. Couldn't find it quickly, though, to give you a link.

  • iagooar 12 years ago

    Isn't it actually kind of sad, that such important researches don't get considered until 40 or 50 years later?

    • riffraff 12 years ago

      I don't think such research wasn't considered, it's been used constantly in many projects since.

      The reason why some platforms have "silly" choices like stop the world M&S GC, or interpreters with AST walking is, likely, that optimization wasn't a goal in the original implementation and it's hard to retrofit it while keeping compatibility.

RyanZAG 12 years ago

Also interesting to compare this with how the G1 collector for the JVM handles it: http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Ru...

EDIT: Better link: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63....

stcredzero 12 years ago

So Ruby objects occupy 40 bytes each, inside pages on the eden heap.

That's pretty huge. That's several times larger than some other dynamic languages -- languages that have essentially the same features as Ruby.

  • phamilton 12 years ago

    Not really. Python primitives are smaller (24 bytes). Python strings are 37 + 1 byte per character. Python objects are 72 bytes. Python classes are 104 bytes. (Got these numbers playing around with sys.getsizeof in the python console.)

    Primitives are a bit bigger in ruby, but they aren't really primitives anyway (everything is an object).

  • rurban 12 years ago

    Perl objects are even bigger and need one indirection per access. A perl head is 32 bytes plus 48 byte body, ie. 80byte on 64bit systems.

    For my perl rewrite p2 I need one single word per object on the stack. Common Lisps usually needs two words, the class pointer and the tagged value.

    I really liked the visualization of the various GC's btw. Excellent work!

  • riffraff 12 years ago

    I have no idea of the size in other language implementations, but notice 40 bytes is on x86-64, on 32 bit it's either 20 or 24.

    • joevandyk 12 years ago

      this is why running large ruby apps on 64 bit OSs can take up a lot more memory than 32 bit.

      • riffraff 12 years ago

        isn't that true of most platforms? I know the JVM has compressed OOPS to limit this issue, but I am not aware of similar things elsewhere.

jeffdavis 12 years ago

To me, it's interesting that after all this time and with the prevalence of languages that rely heavily on garbage collection, garbage collectors still don't seem to scale to large heaps.

Large heaps mean long pauses when doing the "mark" phase of the generation holding most of the objects. That seems like a big problem that will increase as memory sizes increase.

Maybe the way I'm looking at it is too simplistic and there are better methods now. But even in Java, which has had plenty of time to work these issues out, I have seen issues with long GC pauses.

My impression right now is that GCs just don't scale to large heaps. To use more memory, you need to either manage memory yourself (which not a lot of modern languages allow), or increase the number of independent heaps (by using more tasks/processes).

Please enlighten me if I'm wrong here.

  • eonwe 12 years ago

    That seems to be the case in Oracle Java at least. With 80+ GB heaps, you can experience multi-minute stop-the-world collections when a full GC strikes. Luckily, that can be tuned to be very rare (for the usual load).

    For better latency without such hiccups, the only solution I know of is Azul's Zing (http://www.azulsystems.com/zing/pgc) which really did away with larger pauses at least with our software.

    • jeffdavis 12 years ago

      What is "the usual load"?

      Interesting; I'll check out the Azul one. I'm a little skeptical, but it might be an improvement. Did you see better throughput or just reduced latency?

      • eonwe 12 years ago

        By usual I mean what our application was tuned for.

        Which, sadly, may or may not be what it encounters. There are still some usage patterns that can cause large amount of time being spent doing gc (without Zing).

        Latency was greatly reduced, I cannot really say about throughput (except that it was not clearly worse at least).

rurban 12 years ago

The important number is the avg pause time. Here with ruby 2.1 7ms for a minor and 58ms for a major sweep for a typical ~500K heap app.

In my potion-based GC (a stack-scanning, compacting, cheney two-finger GC) the avg GC needs 3ms, but it's not concurrent (multi-threaded) yet.

All of these can be considered real-time, i.e. < 15ms. For bigger heaps the scans need to be incremental (saving and restoring GC state, which is easy with libgc or cheney).

The best java GC I found needed 150ms pause time. Good lisp's have real-time GCs.

venus 12 years ago

What's this RailsApp.preload_all method mentioned in this post? Is that a github thing?

Great post btw. I am looking forward to getting some production apps onto 2.1 in the coming weeks and seeing how performance characteristics have changed in the new release.

  • tmm1OP 12 years ago

    We implement preload_all, which loops over app/{models,controllers}/ and requires everything. This method is generally called from config.ru, and happens in the unicorn master before it forks off workers.

    I recently upstreamed a warmup method for Rack::Builder you can also use for this purpose: https://github.com/rack/rack/pull/617

    • venus 12 years ago

      Thanks for that link (and for submitting it!). I've done something similar in deploy scripts, to warm up a fresh instance before bringing it into rotation - even better having it able to be baked into rack like that.

    • purephase 12 years ago

      That's interesting. Thanks for sharing. I'll have to check this out.

ksec 12 years ago

And all these are all very similar to Google's Chrome V8 and Mozilla's SpiderMonkey. Although both had incremental done already, for Ruby that is scheduled for v2.2.

Now just when will Ruby get a Method or Tracing JIT?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection