Friday, 18 September 2015

Garbage First(G1) : Garbage Collector Overview

Here are the basics of how to use the G1 garbage collector and how it can be used with the Hotspot JVM. You will learn how the G1 collector functions internally, the key command line switches for using G1, and options for logging its operation.

Key Hotspot Components

The key components of the JVM that relate to performance are highlighted in the following image.

    The G1 Garbage Collector

    The Garbage-First (G1) collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with a high probability, while achieving high throughput. The G1 garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is designed for applications that:
    • Can operate concurrently with applications threads like the CMS collector.
    • Compact free space without lengthy GC induced pause times.
    • Need more predictable GC pause durations.
    • Do not want to sacrifice a lot of throughput performance.
    • Do not require a much larger Java heap.


    When performing garbage collections, G1 operates in a manner similar to the CMS collector. G1 performs a concurrent global marking phase to determine the liveness of objects throughout the heap. After the mark phase completes, G1 knows which regions are mostly empty. It collects in these regions first, which usually yields a large amount of free space. This is why this method of garbage collection is called Garbage-First. As the name suggests, G1 concentrates its collection and compaction activity on the areas of the heap that are likely to be full of reclaimable objects, that is, garbage. G1 uses a pause prediction model to meet a user-defined pause time target and selects the number of regions to collect based on the specified pause time target.

    The regions identified by G1 as ripe for reclamation are garbage collected using evacuation. G1 copies objects from one or more regions of the heap to a single region on the heap, and in the process both compacts and frees up memory. This evacuation is performed in parallel on multi-processors, to decrease pause times and increase throughput. Thus, with each garbage collection, G1 continuously works to reduce fragmentation, working within the user defined pause times. This is beyond the capability of both the previous methods. CMS (Concurrent Mark Sweep ) garbage collector does not do compaction. ParallelOld garbage collection performs only whole-heap compaction, which results in considerable pause times.

    The G1 Garbage Collector Step by Step

    The G1 collector takes a different approach to allocating the heap. The pictures that follow review the G1 system step by step.
    G1 Heap Structure
    The heap is one memory area split into many fixed sized regions.
    Region size is chosen by the JVM at startup. The JVM generally targets around 2000 regions varying in size from 1 to 32Mb.
    G1 Heap Allocation
    In reality, these regions are mapped into logical representations of Eden, Survivor, and old generation spaces.
    The colors in the picture shows which region is associated with which role. Live objects are evacuated (i.e., copied or moved) from one region to another. Regions are designed to be collected in parallel with or without stopping all other application threads.
    As shown regions can be allocated into Eden, survivor, and old generation regions. In addition, there is a fourth type of object known as Humongous regions. These regions are designed to hold objects that are 50% the size of a standard region or larger. They are stored as a set of contiguous regions. Finally the last type of regions would be the unused areas of the heap.
    Note: At the time of this writing, collecting humongous objects has not been optimized. Therefore, you should avoid creating objects of this size.
    Young Generation in G1
    The heap is split into approximately 2000 regions. Minimum size is 1Mb and maximum size is 32Mb. Blue regions hold old generation objects and green regions hold young generation objects.
    Note that the regions are not required to be contiguous like the older garbage collectors.
    A Young GC in G1
    Live objects are evacuated (i.e., copied or moved) to one or more survivor regions. If the aging threshold is met, some of the objects are promoted to old generation regions.
    This is a stop the world (STW) pause. Eden size and survivor size is calculated for the next young GC. Accounting information is kept to help calculate the size. Things like the pause time goal are taken into consideration.
    This approach makes it very easy to resize regions, making them bigger or smaller as needed.
    End of a Young GC with G1
    Live objects have been evacuated to survivor regions or to old generation regions.
    Recently promoted objects are shown in dark blue. Survivor regions in green.
    In summary, the following can be said about the young generation in G1:
    • The heap is a single memory space split into regions.
    • Young generation memory is composed of a set of non-contiguous regions. This makes it easy to resize when needed.
    • Young generation garbage collections, or young GCs, are stop the world events. All application threads are stopped for the operation.
    • The young GC is done in parallel using multiple threads.
    • Live objects are copied to new survivor or old generation regions.

    Old Generation Collection with G1

    Like the CMS collector, the G1 collector is designed to be a low pause collector for old generation objects. The following table describes the G1 collection phases on old generation.

    G1 Collection Phases - Concurrent Marking Cycle Phases

    The G1 collector performs the following phases on the old generation of the heap. Note that some phases are part of a young generation collection.
    PhaseDescription
    (1) Initial Mark
    (Stop the World Event)
    This is a stop the world event. With G1, it is piggybacked on a normal young GC. Mark survivor regions (root regions) which may have references to objects in old generation.
    (2) Root Region ScanningScan survivor regions for references into the old generation. This happens while the application continues to run. The phase must be completed before a young GC can occur.
    (3) Concurrent MarkingFind live objects over the entire heap. This happens while the application is running. This phase can be interrupted by young generation garbage collections.
    (4) Remark
    (Stop the World Event)
    Completes the marking of live object in the heap. Uses an algorithm called snapshot-at-the-beginning (SATB) which is much faster than what was used in the CMS collector.
    (5) Cleanup
    (Stop the World Event and Concurrent)
    • Performs accounting on live objects and completely free regions. (Stop the world)
    • Scrubs the Remembered Sets. (Stop the world)
    • Reset the empty regions and return them to the free list. (Concurrent)
    (*) Copying
    (Stop the World Event)
    These are the stop the world pauses to evacuate or copy live objects to new unused regions. This can be done with young generation regions which are logged as [GC pause (young)]. Or both young and old generation regions which are logged as [GC Pause (mixed)].

    G1 Old Generation Collection Step by Step

    With the phases defined, let's look at how they interact with the old generation in the G1 collector.
    Initial Marking Phase
    Initial marking of live object is piggybacked on a young generation garbage collection. In the logs this is noted as GC pause (young)(inital-mark).
    Concurrent Marking Phase
    If empty regions are found (as denoted by the "X"), they are removed immediately in the Remark phase. Also, "accounting" information that determines liveness is calculated.
    Remark Phase
    Empty regions are removed and reclaimed. Region liveness is now calculated for all regions.
    Copying/Cleanup Phase
    G1 selects the regions with the lowest "liveness", those regions which can be collected the fastest. Then those regions are collected at the same time as a young GC. This is denoted in the logs as [GC pause (mixed)]. So both young and old generations are collected at the same time.
    After Copying/Cleanup Phase
    The regions selected have been collected and compacted into the dark blue region and the dark green region shown in the diagram.

    Summary of Old Generation GC

    In summary, there are a few key points we can make about the G1 garbage collection on the old generation.
    • Concurrent Marking Phase
      • Liveness information is calculated concurrently while the application is running.
      • This liveness information identifies which regions will be best to reclaim during an evacuation pause.
      • There is no sweeping phase like in CMS.
    • Remark Phase
      • Uses the Snapshot-at-the-Beginning (SATB) algorithm which is much faster then what was used with CMS.
      • Completely empty regions are reclaimed.
    • Copying/Cleanup Phase
      • Young generation and old generation are reclaimed at the same time.
      • Old generation regions are selected based on their liveness.

    Setting the Log Detail

    You can set the detail to three different levels of detail.
    (1) -verbosegc (which is equivalent to -XX:+PrintGC) sets the detail level of the log to fine.
    Sample Output
    [GC pause (G1 Humongous Allocation) (young) (initial-mark) 24M- >21M(64M), 0.2349730 secs]
    [GC pause (G1 Evacuation Pause) (mixed) 66M->21M(236M), 0.1625268 secs]   

    Basic Command Line

    To enable the G1 Collector use: -XX:+UseG1GC
    Here is a sample command line for starting the Java2Demo included in the JDK demos and samples download:
    java -Xmx50m -Xms50m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -jar c:\javademos\demo\jfc\Java2D\Java2demo.jar

    Complete List of G1 GC Switches

    This is the complete list of G1 GC switches. Remember to use the best practices outlined above.

    Option and Default ValueDescription
    -XX:+UseG1GCUse the Garbage First (G1) Collector
    -XX:MaxGCPauseMillis=nSets a target for the maximum GC pause time. This is a soft goal, and the JVM will make its best effort to achieve it.
    -XX:InitiatingHeapOccupancyPercent=nPercentage of the (entire) heap occupancy to start a concurrent GC cycle. It is used by GCs that trigger a concurrent GC cycle based on the occupancy of the entire heap, not just one of the generations (e.g., G1). A value of 0 denotes 'do constant GC cycles'. The default value is 45.
    -XX:NewRatio=nRatio of new/old generation sizes. The default value is 2.
    -XX:SurvivorRatio=nRatio of eden/survivor space size. The default value is 8.
    -XX:MaxTenuringThreshold=nMaximum value for tenuring threshold. The default value is 15.
    -XX:ParallelGCThreads=nSets the number of threads used during parallel phases of the garbage collectors. The default value varies with the platform on which the JVM is running.
    -XX:ConcGCThreads=nNumber of threads concurrent garbage collectors will use. The default value varies with the platform on which the JVM is running.
    -XX:G1ReservePercent=nSets the amount of heap that is reserved as a false ceiling to reduce the possibility of promotion failure. The default value is 10.
    -XX:G1HeapRegionSize=nWith G1 the Java heap is subdivided into uniformly sized regions. This sets the size of the individual sub-divisions. The default value of this parameter is determined ergonomically based upon heap size. The minimum value is 1Mb and the maximum value is 32Mb.

No comments:

Post a Comment