Garbage-First Collector (G1)

History and Introduction

With releasing JDK 7 Oracle also introduces the new garbage collection algorithm i.e. Garbage-First Collector or G1. This changes the memory management all over again.

Garbage-First (G1) Collector is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability, while achieving high throughput. G1 is the long term replacement of the Concurrent Mark-Sweep Collector (CMS). Whole-heap operations, such as global marking, are performed concurrently with the application threads, to prevent interruptions proportional to heap or live-data size. Concurrent marking provides both collection “completeness” and identifies regions ripe for reclamation via compacting evacuation. This evacuation is performed in parallel on multi-processors, to decrease pause times and increase throughput.

G1 first introduced in Java SE 6 Update 14. G1 can be enabled in Java SE 6 Update 14 with the following two command-line parameters:


-XX:+UnlockExperimentalVMOptions

-XX:+UseG1GC

 

What is Garbage Collection?

A garbage collector works to reclaim areas of memory within an application that will never be accessed again. At the most fundamental level, garbage collection involves two deceivingly simple steps:

Determine which objects can no longer be referenced by an application. This is done either via object reference counting, or objects graphs (tracing). Reclaim the memory occupied by dead objects (the garbage).

Garbage Collection till JDK 6

With JDK 7, G1 is planned to replace CMS in the Hotspot JVM. There are two major differences between CMS and G1. The first is that G1 is a compacting collector. G1 compacts sufficiently to completely avoid the use of fine-grain free lists for allocation, which considerably simplifies parts of the collector and mostly eliminates potential fragmentation issues. As well as compacting, G1 offers more predictable garbage collection pauses than the CMS collector and allows users to set their desired pause targets.

Let’s discuss how it’s designed.

Data Structure

The Garbage-First heap is divided into equal-sized heap regions, each a contiguous range of virtual memory. Allocation in a heap region consists of incrementing a boundary, top, between allocated and unallocated space. One region is the current allocation region from which storage is being allocated.

Objects whose size exceeds 3/4th of the heap region size, however, are termed humongous. Humongous objects are allocated in dedicated (contiguous sequences of) heap regions; these regions contain only the humongous object.

Regions are further broken down into 512 byte sections called cards. Each card has a corresponding one-byte entry in a global card table, which is used to track which cards are modified by mutator threads. Subsets of these cards are tracked, and referred to as Remembered Sets (RS).

Stages

The G1 collector works in stages. The main stages consist of remembered set (RS) maintenance, concurrent marking, and evacuation pauses.

RS Maintenance

Each region has an associated remembered set, which indicates all locations that might contain pointers to (live) objects within the region. Maintaining these remembered sets requires that mutator threads inform the collector when they make pointer modifications that might create inter region pointers. This notification uses a card table: every 512-byte card in the heap maps to a one-byte entry in the card table. Each thread has an associated remembered set log, a current buffer or sequence of modified cards. In addition, there is a global set of filled RS buffers.

for a particular region (i.e., region a), only cards that contain pointers from other regions to an object in region a are recorded in region a’s RS. A region’s internal references, as well as null references, are ignored.

In reality, each region’s remembered set is implemented as a group of collections, with the dirty cards distributed amongst them according to the number of references contained within. Three levels of coarseness are maintained: sparse, fine, and course. It’s broken up this way so that parallel GC threads can operate on one RS without contention, and can target the regions that will yield the most garbage. However, it’s best to think of the RS as one logical set of dirty cards.

Evacuation Pauses

At appropriate points mutator threads get stooped and perform an evacuation pause. Here it chooses a collection set of regions, and evacuate the regions by copying all their live objects to other locations in the heap, thus freeing the collection set regions. Evacuation pauses exist to allow compaction: object movement must appear atomic to mutators. This atomicity is costly to achieve in truly concurrent systems, so it moves objects during incremental stop-world pauses instead.

To help limit the total pause time, much of the evacuation is done in parallel with multiple GC threads. The strategy for palatalization involves the following techniques:

GC TLABS: The use of thread local allocation buffers (TLAB) for the GC threads eliminates memory-related contention amongst the GC threads. Forwarding pointers are inserted in the GC TLABs for evacuated live objects.

Work Competition: GC threads compete to perform any of a number of GC-related tasks, such as maintaining remembered sets, root object scanning to determine reachability (dead objects are ignored), and evacuating live objects.

Work Stealing: Part of mathematical systems theory, the work done by the GC threads is unsynchronized and executed arbitrarily by all of the threads simultaneously. This chaos-based algorithm equates to a group of threads that race to complete the list of GC-related tasks as quickly as they can without regard to one another. The end result, despite the apparent chaos, is a properly collected group of heap regions.

Concurrent Marking

Concurrent marking provides collector completeness without imposing any order on region choice for collection sets. it provides the live data information that allows regions to be collected in garbage first order. It uses a form of snapshot-at-the-beginning concurrent marking. In this style, marking is guaranteed to identify garbage objects that exist at the start of marking, by marking a logical snapshot of the object graph existing at that point.

Marking is done in three stages:

Marking Stage. The heap regions are traversed and live objects are marked:

First, since this is the beginning of a new collection, the current marking bitmap is copied to the previous marking bitmap, and then the current marking bitmap is cleared.

Next, all mutator threads are paused while the current TAMS pointer is moved to point to the same byte in the region as the top (next free byte) pointer.

Next, all objects are traced from their roots, and live objects are marked in the marking bitmap. We now have a snapshot of the heap.

Next, all mutator threads are resumed.

Next, a write buffer is inserted for all mutator threads. This barrier records all new object allocations that take place after the snapshot into change buffers.

Re-marking Stage. When the heap reaches a certain percentage filled, as indicated by the number of allocations since the snapshot in the Marking Stage, the heap is re-marked:

As buffers of changed objects fill up, the contained objects are marked in the marking bitmap concurrently.

When all filled buffers have been processed, the mutator threads are paused.

Next, the remaining (partially filled) buffers are processed, and those objects are marked also.

Cleanup Stage. When the Re-mark Stage completes, counts of live objects are maintained:

All live objects are counted and recorded, per region, using the marking bitmap.

Next, all mutator threads are paused.

Next, all live-object counts are finalized per region.

The TAMS pointer for the current collection is copied to the previous TAMS pointer (since the current collection is basically complete).

The heap regions are sorted for collection priority according to a cost algorithm. As a result, the regions that will yield the highest numbers of reclaimed objects, at the smallest cost in terms of time, will be collected first. This forms what is called a collection set of regions.

All mutator threads are resumed.

Conclusion

Garbage-First is a garbage collector that combines concurrent and parallel collection and is targeted for large multi-processor machines with large memories.

If your application requires guaranteed real-time behavior even with garbage collection, your only choice is a real-time garbage collector such as those that come with Sun’s Java RTS or IBM’s WebSphere RT products. However, if low pause times and soft real-time behavior is your goal, the G1 collector should suit it well.

 

Refferences

Garbage-First Garbage collection (patent paper) : http://labs.oracle.com/jtech/pubs/04-g1-paper-ismm.pdf

Sun Java RTS – http://java.sun.com/javase/technologies/realtime/reference.jsp

Real time Garbage Collection: http://www.ibm.com/developerworks/java/library/j-rtj4/index.html?S_TACT=105AGX02&S_CMP=EDU

Advertisements

3 Comments Add yours

  1. mnicky says:

    Is G1 in JDK7 used by default or must be enabled?

    1. Abhinaba Basu says:

      It is auto enabled…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s