.NET Garbage Collection

The CLR comes in two builds. A workstation build and a server build. As the name indicates, a workstation build is for single CPU workstations while a server build is for multi-CPU machines. No matter how many CPU’s there are on a machine, the default build is always workstation build.

If you are running the workstation build and are on a machine with more than on CPU, you can also configure the garbage collection to run concurrent. That means that collecting happens on background threads while user code is running on foreground threads. If you are running on a single CPU machine, garbage collection always happens on foreground threads. Concurrent collections is great for machines that have high user interactivity, experiencing a machine that is very responsive.

If you are running the server build, the garbage collection is always non-concurrent. Non-concurrent collection has much higher throughput. No need to synchronize objects during collection.

When a managed process starts, the CLR reserves a region of memory called the managed heap. When managed code allocates memory, i.e., from C# new operator, it is allocated from this managed heap. The CLR maintains a pointer we shall call NextAllocPtr that point to next free position in the managed heap. A managed heap with two objects – A and B – allocated in it looks like this:

Managed heap with 2 objects allocated.

When an object – say object C – is allocated from the managed heap, the CLR first needs to calculate the size of the object and then add the overhead members – sync block index and type handle pointer. These two extra members are 4 byte each on a 32-bit process, and 8 byte each on a 64-bit process. Afterwards the CLR checks to see if there is room for the object on the managed heap. If there is enough free space, the bytes from NextAllocPtr to NextAllocPtr+ObjectSize will be zeroed out. The Type constructor will be called, and NextAllocPtr set to point to next available free space in managed heap. See picture below.

Managed heap with 3 objects allocated.

Due to this way of allocating objects, .NET brings good performance because of locality of reference. There is also a good chance that the process’ working set will be small. The garbage collector assumes that memory is infinite, so it needs a mechanism for collecting garbage. The CLR garbage collector is an ephemeral/generational garbage collector. A generational garbage collector makes the following assumptions:

  • The newer an object is, the shorter life it will have.
  • The older an object is, the longer life it will have.
  • Collecting parts of the heap is faster than collecting the whole heap.

The way the CLR marks an object as a candidate for collecting, is if there are roots that points to it. A root is a valid reference to an object on the managed heap from either the stack, reference field in object, CPU register, static field etc. Those objects that are not reachable by all these references are marked for garbage collection. A bit in the sync block index field of the managed object tells if the object has been checked. So no circular referencing can take place.

The CLR garbage collector uses three generations. If we allocate two more objects, the heap will look like this:

Managed heap with five objects.

Let us say that generation 0 is full now. We add one more object to the managed heap, and object B and D have become unreachable. That is, they are marked for collection. This is what the managed heap would look like:

Managed heap with three objects in generation 1 and one objects in generation 0.

As can be seen by picture above all the objects that were reachable are now in generation 1 and the new object has ended up in generation 0. Memory has been compacted, and all references to objects affected has been updated to point to the object’s new locations. This is a lot of work, and that work takes precious time. This time is the major downside of using a garbage collector. Even so .NET performance is good.

Let us allocate four more objects in generation 0 (G-J), and object A and F becomes unreachable. Let us also say that generation 0 has passed its threshold size when allocating object J, so a garbage collection occurs. The managed heap would after look like this:

Managed heap.

As you can see, object A is still in generation 1. That is because being a generational garbage collector it will only collect generation 1 when generation 1’s threshold size has been reached. Let us now say that generation 1’s threshold size has been reached. Let us now allocate object (K-N), and object N forces a garbage collection to occur (generation 0 threshold size is reached). The next time a garbage collector runs, it will look like this:

Managed heap with objects now in all generations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: