Amazon.com says I bought Programming Ruby (2nd edition) on November 4, 2004. Seven short years of Java later I'm finally putting it to use. One immediate question is, how does garbage collection work in Ruby? Is it a tracing or reference counting collector? Is it generational, and if so, how many generations? These questions eventually led me to an interesting paper by David Bacon, Perry Cheng and VT Rajan on A Unified Theory of Garbage Collection.
Universally, garbage collectors are viewed as either tracing or reference counting. The Sun HotSpot VM, for example, is a tracing collector. It traverses the object graph from the roots, finding reachable live objects and reclaiming unreachable dead objects. PHP, on the other hand, uses reference counting to decide which objects can be safely garbage collected. This table summarizes the typical characterization of each collector type,
Tracing |
Reference Counting | |
Collection Style |
Batch |
Incremental |
Cost per Mutation |
None |
High |
Throughput |
High |
Low |
Pause Times |
Long |
Short |
Real Time |
No |
Yes |
Collects Cycles |
Yes |
No |
The first observation is, once optimizations are added to the basic tracing and reference counting collectors, they begin to converge. For example, with deferred reference counting there is a periodic scanning of stack references to reduce per-mutation cost. A generational tracing collector, on the other hand, imposes a per-mutation overhead but gains shorter pause times. So even though collectors are typically described as one or the other, we should really view them as hybrids.
The second interesting observation is, tracing algorithms are the duals of the reference counting algorithms. Both types of algorithms share the same structure.
Tracing |
Reference Counting | |
Starting Point |
Roots |
Anti-Roots |
Graph Traversal |
Forward from roots |
Forward from anti-roots |
Objects Traversed |
Live |
Dead |
Initial Reference Count |
Low (0) |
High |
Reference Count Reconstruction |
Addition |
Subtraction |
Extra Iteration |
Sweep Phase |
Trial Deletion |
Tracing collectors start at the root, traverse foward, finding live objects. Initially all reference counts are zero (an underestimate) but live objects have their reference count incremented to their true value (keeping a bit flag is just an optimization of actual count). Reference counting collectors start from non-root objects, traversing forward to find dead objects. Reference counts are initially high (an overestimation that includes references from dead objects) but decremented until the true reference count is obtained. Ultimately, the difference is just the initial value (an underestimate or overestimate) and whether we increment or decrement to obtain the true reference count.
So that was interesting, but lets return my the original question about Ruby garbage collection and add a few other languages in for fun. Here's a quick summary. Don't quote me on it, corrections appreciated.
Language | Garbage Collection (for reference/popular implementation - others may be available) |
Erlang | tracing - per process (very small processes), compacting then generational optimization |
Haskell | tracing - generational (3), 512KB nursery |
Java | tracing - generational (2), serial, concurrent, concurrent + incremental |
Perl | reference counting - does not handle cycles |
PHP | reference countng - 5.3 handles cycles w/ "on the fly" cycle detector, prior versions did not |
Python | reference counting - handles cycles w/ tracing cycle detector |
Ruby | tracing - non-generational |
Scala | see Java |
Comments