TL;DR Writing tri-color (actually 4-color) incremental generational garbage collector in C
Previously we’ve talked about manual memory allocation and unraveled some shadows around it. Now let’s talk about automatic memory allocation, more specifically about Garbage Collection. Most if not all modern Programming Languages contain a mechanism that handles allocation of memory automatically at precisely the moment application needs it and deallocation of that memory at the moment it’s not needed anymore. However determining when memory region (or object) is safe to deallocate is not an easy feat as we need to somehow determine if the region is needed or not. And here is where Garbage Collection comes into a play. Simply speaking Garbage Collection is an algorithm that checks entire memory of an application for regions that aren’t referenced anywhere inside application by other memory regions and deallocates these regions because they aren’t needed anymore.
Garbage Collection was invented by John McCarthy back in 1959 for Lisp and is still one of the hottest evolving topics in Computer Science nowadays, especially with Java’s GC becoming more famous with tons of blows and whistles which can be tuned to practically any application memory usage scenario. There are also alternative models to automatic memory handling such as newLisp’s ORO, Rust’s ownership based life cycle management and Objective-C’s ARC but we won’t be talking about those. The cost of checking entire memory region is not a cheap one so people realized ways to make it more reasonable and thus nowadays we have various types of Garbage Collection algorithms. There are tracing garbage collectors and reference counting ones, copying (moving) and non-copying (non-moving) ones, incremental and non-incremental, generational, concurrent, parallel and real-time ones. To learn more about all those types of Garbage Collectors I suggest you try reading The Garbage Collection Handbook which is a fundamental must read handbook for all those interested in the topic. But for rest of you lazy bunch let’s start collecting all the garbage in the world by writing one garbage collector ourselves.
Let’s write a single threaded tracing tri-color (actually 4-color, I’ll explain later why) non-copying incremental n-generational (n-generational meaning that we can configure any number of generations dynamically on startup, but our implementation will have limitation of maximum 64 generations) Garbage Collector in C. We won’t tackle multithreading, concurrency and parallel garbage collection for sake of simplicity as those are more complex to implement but maybe I’ll do another blog post about them later this year. Our simple garbage collector will be single threaded only, meaning it will correctly work only in applications that don’t use multiple threads or somehow allocate memory in parallel. This kind of Garbage Collector is sufficient for languages that aren’t natively multithreaded, such as Node.js and Pharo, so if you are planning to write such language yourself this might be a useful exercise for you. For those of you impatient ones there is a simplegc repo with all the source code to study. So let’s get started.
We need a way to represent our memory regions so we’ll go for the most obvious object oriented representation, meaning we’ll just write a garbage collector for object-oriented programming language of ours. Each object will be part of doubly linked list so we could trace it and hence it will contain a pointer to a pointer of a previous object pointing to it called gc_prev and a pointer to the next object called gc_next. Each object will have a pointer to a class description which we’ll talk about later. Each object will also have 32-bit unsigned integer called gc_mark which will serve three purposes: to mark object’s color, gc generation and count root references (I’ll explain those later). Each object might also reference other objects for example we might have fields inside object which could contain other objects so hence we need refs_count variable to count number of fields object has. Since each object will be of a class we also need to represent the class itself. It will contain function pointers that will be used during garbage collection. One of them is the most obvious function pointer which is a gc_finalize function. This function will be called precisely before the moment of freeing memory region which is not used anymore or simply speaking just before the object will be deleted by the Garbage Collector. gc_mark_black function pointer will point to a function used to mark object from grey to black and mark all objects it’s referring (or contains) as grey for further processing. gc_contains will be a function used to check if an object is referenced (or contained) by another object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Now let’s talk about gc_mark. As I’ve said earlier it’ll serve three purposes for our object: root reference count, object’s color marking, and gc generation. We’ll use first 24 bits for root reference counting, next 2 bits will be used for color (as we’ll have 4 possible colors we need 2 bits since 22 is 4) and remaining 6 bits will be used for gc generation (meaning we can’t have more than 64 generations since 26 is 64, but that is enough for any use case).
Let’s define some useful macros to work with gc_mark. Most of them are related to bitwise magic so I won’t explain them in details but instead leave that as self-study for you.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Now let’s describe configuration for our garbage collector. First thing first we need to limit duration of our garbage collection cycle since we don’t want it to be too long and are writing an incremental garbage collector after all. For this we’ll have max_pause configuration property which will define maximum amount of nanoseconds gc cycle can run. Also since we need to somehow check if our cycle is exceeding it’s pause limit or not we just can’t do it after every object we check during gc cycle, thus we need pause_threshold which is number of objects checked after which gc pause check occurs. Next we define number of generations our garbage collector will have using gens_count and provide generation configurations array using gens. Each generation configuration will have refresh_interval which is number of nanoseconds passed after last full gc cycle after which entire generation of objects should be rechecked. Now we need to clarify what is full gc cycle. Full gc cycle is garbage collection cycle that is not interrupted after hitting max pause threshold. Last thing our generation configuration has is promotion_interval which is number of nanoseconds passed after last full gc cycle after which our objects in generation are promoted to higher generation. This configuration property is not relevant for the last generation since there is no next generation to promote to.
1 2 3 4 5 6 7 8 9 10 11 12 13
Let’s talk about why we need 4 colors for describing the state of the object in relevance to current garbage collection cycle. In traditional tri-color garbage collector objects can be white, grey and black. Newly created objects are marked as white. After object is identified to have a root reference it’s color is changed from white to grey. Next all grey objects are checked for references of other objects that they contain. For each contained reference if reference is white it’s marked as grey for future checking after which the container object is marked as black. All remaining grey objects are subsequently checked for references of objects they contain. After this checks we have only black and white objects. Objects marked as black are reachable from the root and hence don’t need to be garbage collected. Remaining white objects on the other hand are considered garbage and freed in the end. You can read about this in more detail here.
Traditional tri-color algorithm doesn’t has a notion of generation in mind because checking objects only in some specific limited generation won’t be correct for determining if objects are garbage or not as some of those objects might have references from other generations. Let’s consider that you have a generation of objects and you suddenly want to recheck them all. What you will probably do is mark all objects as white again and then mark objects that have root references as grey and go from there. However the problem with this approach is that in the end we might have white objects that don’t have direct references in generation they are part of but have indirect reference from root objects in another generation. This is why we need 4 colors. We will treat those objects as silver and won’t consider them as white yet. For each such object we’ll check all black objects in all generations for possible references. If such reference will be found we’ll mark that object as grey and will promote it to the generation that has reference to it. This might introduce some generation bouncing but since all live objects in all generations are subsequently promoted after some period of time to last generation this won’t be a problem.
Since we need to be incremental during our garbage collection cycle for all white objects that are considered garbage we’ll have a separate color that we’ll call transparent, but we won’t need additional bit in gc_mark to identify objects with such color. This kind objects won’t be marked directly but instead will be part of transparent list that will be used during last step in our incremental garbage collection cycle to call gc_finalize function for each object in that list and free it from memory. This is needed because gc_finalize and free calls can be time costly operations hence we need to delay them until our gc cycle is fully over so they could be executed in the remaining cycle time. This way we can have a nearly timely precise incremental garbage collector that won’t ever exceed our predefined max_pause time.
Since we are going to count time using nanoseconds we need a way to measure nanoseconds passed between execution of arbitrary code. For this we’ll use Linux syscall clock_gettime.
1 2 3 4 5 6
Now let’s define some global constants and variables for our garbage collector. Note how we are going to have single double linked lists for each gc color except black. Each generation will have it’s own black objects list and hence we’ll need an array of doubly linked objects lists.
1 2 3 4 5 6 7 8 9 10 11 12 13
Finally we can start writing some functions. Let’s start with Garbage Collector initialization function first. This will be called on startup and only once. It checks if generation count is correct and next thing is it just copies over configuration object including the contained generation configurations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
We also need a function to destroy our garbage collector when it’s not needed anymore. This needs to free up all doubly linked lists as those can be considered part of garbage collector. Note how we call gc_finalize function before freeing up object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Now let’s define some helper functions for working with doubly linked lists. We’ll use those to add, remove, and move objects from one list to another.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Now in order to determine if object is reachable or not we’ll just use root reference counting instead of going over the stack and marking objects each time. This might be considered more expensive as each time we set a reference to some object on the stack we need to increment it’s root reference count. And every time we pop a stack frame or modify a reference on the stack we need to decrement root reference count. Doing this many times is expensive indeed however from my point of view this approach is more faster in long term compared to scanning the stack each time during garbage collection. But if you really want it our simple garbage collector can be easily modified to use the scanning approach if needed.
As you can see from code when we increment root reference count we also check if object is white and mark it as grey for future check by Garbage Collector. We could omit this step, however in doing so we would require to go through entire white objects set and check their root reference count each time during Garbage Collector cycle. As you can see there are no additional steps during decrement of root reference count, and that is because we don’t need to do anything since when root reference count becomes 0 our object will still be refreshed during generation refresh phase and during it the root reference count will be checked anyway.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Next thing we need to do is to actually allocate the object itself and initialize it’s gc_mark. We also need to add it to white objects list as all new objects should be added to white list. Without it we won’t be able to track allocated objects. Look at how we also initialize references to null based on how much references an object could contain. This is specified using refs_count argument to the function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Since our object could contain references we need a way to be able to modify those references. Since we are introducing mutation we need to be able to revert the color of black object back to grey in order to be able to recheck it during gc cycle. This operation is not expensive and is executed only once during mutation between gc cycles if mutation of the object occurs thus we don’t get any performance penalties by doing so each time on reference modification.
1 2 3 4 5 6 7 8 9 10
Let’s start writing gc function. First we need to initialize cycle counters which we’ll be used during our gc cycle. I’ve added this variables to gc_config struct just for the sake of simplicity. First thing we are going to do during our gc cycle is to free all transparent objects that might have been left over from the previous gc cycle. Before freeing each object we call gc_finalize function from it’s class. Note however that since we are writing an incremental garbage collector we need to be able to end the cycle prematurely if it takes too much time. To do so we’ll be using gc_cycle_check macro. In order for gc_cycle_check to determine if the check for threshold should occur or not we increment cycle_threshold counter. During gc_cycle_check we just check if cycle_threshold is bigger than configured pause_threshold and if it is we just check the amount of passed time in nano seconds since the start of the gc cycle. If the time passed exceeds max_pause configuration value we just call gc_cycle_end function and end cycle prematurely. We’ll rely on gc_cycle_check macro heavily through our entire gc cycle.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
After cleaning up transparent list next thing what we are going to do is promote generations that need promotion and refresh those that need refreshment. Now this might sound something complex but it’s pretty much straight forward. We just track time of last promotion and refreshment for each generation in promotion_time and refresh_time variables and compare them with configuration intervals called promotiona_interval and refresh_interval configured individually for each generation. Note how we skip promotion of the last generation as there is no generation to promote to. During promotion what we are doing is just incrementing generation setting and moving object from one generation black list to another. Note that while doing that we also check gc pause threshold using gc_cylce_check macro. During refresh we are checking root reference count and based on that move objects either to grey list or silver.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
Next phase after promotion/refreshment is marking phase. During this phase we go over all grey objects and mark them as black using class assigned gc_mark_black function pointer. Now in our gc_object’s class we assigned it to gc_object_mark_black function which is straightforward as we just iterate over references object contains, mark them as grey if they are white or silver and after that mark object as black. Note how we are using gc_cycle_check_no_return macro which is almost the same as gc_cycle_check macro except that it doesn’t ends the cycle by calling gc_cycle_end as we are doing the same check in caller code using gc_cycle_check macro. This is needed to end cycle prematurely from callee gc_mark_black function. This might sound as overly complex but this way we can extend our garbage collector to support objects with any kind of reference layout which might come handy for future.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Next comes the tricky part. Now we are left with silver objects that aren’t entirely white or grey yet. What we are going to do is to go through all generations starting with older ones and check if objects in that generation might contain our silver object. This might sound trivial but it’s more tricky. We are going to check our silver objects using gc_contains function pointer in the class which in our case will point to gc_object_contains function. This is all for the same sake of extensibility of our garbage collector. Now if the object is not found in all black objects of all generations we can safely mark it as white. On the other hand if it’s found we need to first correct it’s generation, mark it as grey and recheck all left over grey objects all over again in order to safely trace references that this object might contain which on the other hand might be some silver objects too. This way we decrease the amount of silver objects that we need to check to essential minimum as the operation itself is very expensive.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
Finally we need to sweep all white objects, by moving them to transparent list. Next we just cleanup the transparent list again while also checking for the pause threshold using gc_cycle_check macro again. And finally we mark cycle as full and end it using gc_cycle_end function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Now that we have a garbage collector we need a way to test it to be sure it’s working correctly. To do that we’ll create a micro DSL test language that will look like this. This language is inspired by a similar language that I’ve created to test memory allocator in previous article. This language is fairly compact. Note that putting an object into the slot doesn’t means it automatically has a root reference. Objects that don’t have root reference or are referenced by root objects might get deleted after gc but our test compiler will check that kind of objects during compilation so we won’t have any Segmentation Fault.
1 2 3 4 5 6 7
In order to actually write tests for that I’ve created gctestgen.rb Ruby script that can generate random test files of any size. This test have iterations. In each iteration first we allocate objects, then we decrement root reference counts of some random objects from previous iteration. After that we initialize references in objects randomly and run garbage collection. This way we simulate behaviour of an application allocating objects and running garbage collection.
1 2 3 4 5 6 7
Now in order to compile our test I wrote a mini meta-compiler gctest.rb in Ruby that generates .c file for testing and links it against our garbage collector and creates executable binary gctest. While compiling it also simulates garbage collection (simple full garbage collection each time) and counts expected number of objects to survive entire test. After that it runs that executable and compares it’s output to expected count to proof correctness of the garbage collector. The source code is in repository so you could all study it. The following initial configuration is provided for testing which is just 3 generations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
As you can see test took 250ms to allocate 60000 objects. Garbage collector was called 60 times and overall took 53.74ms after which only 4606 objects survived. We wrote a Garbage Collector, finally. That’s it folks, have a nice garbage collector hacking time!!!