Analyzing Kernel Times during GC

Today we want to introduce quite simple yet very valuable feature - Kernel and User Times graphs in the reports. Potentially it can significantly increase the quality of your expertise of the garbage collector work.

Why it’s important?

During the garbage collection process the CPU time is separated into two groups: GC Threads (user) and OS system calls (kernel). Both are done using multiple threads (except Serial or non-parallel OldGen collector is in use), which amount is usually equals to the number of available cores (unless -XX:ParallelGCThreads=<value> flag is set).

Then, the real pause time can be roughly calculated like this: (user_CPU + kernel_CPU) / ParallelGCThreads.

The problem with analyzing only real value is essential - you don’t know whether this pause was caused by inappropriate GC configuration or some OS delays/problems.

Consider the following example:

Young pause in a red rectangle took ~508 ms, which stands out strongly from the rest. The question is how this happened? Why the garbage collector stuck on this?

It turns out that this pause has nothing to do with the concrete GC algorithm. By plotting the work of Kernel, we simply see that it was the root cause of it:

Which is a good indicator that the issue lays in underlying OS work rather than JVM.

Solution

We at GCPlot GC Logs Analyzer platform considered this as a major flaw of today’s GC logs analysis approaches and started to support separate graphs for both Kernel and User CPU usages. This would open up the possibility of much more verbose analysis on a distinct pause level.

Apart from that, our platform can still produce tons of other graphs and statistics, which would tell nearly everything about how your GC flies.

Java GC Causes Distilled

Introduction

As it’s quite obvious, every GC collection in Hotspot JVM has a good reason to start. They might vary with respect to the current situation in the system and memory state. They all are known as GC Causes. Here we will try to highlight and explain all the possible reasons for GC to start in Java 7/8.

GC Causes

Following is a table which describes each possible GC Cause. It is ordered by an occurrence frequency in an average application.

GC Cause Description
Allocation Failure Application tried to make a new allocation and failed due to lack of available space in Young generation; hence Minor GC is required. On Linux, the JVM can trigger a GC if the kernel notifies there isn’t much memory left via mem_notify.
GC Locker GC is started after all threads leaving JNI Critical region. For more information on JNI, refer to the Java Native Interface documentation web site. GC is blocked when any thread is in the JNI Critical region and can start only when all of them outside of it.
G1 Evacuation Pause This is actual only for G1 collector. It indicates that it is copying live objects from one set of regions (Young and sometimes Young + Tenured, which is called Mixed) to another set of regions.
G1 Humongous Allocation This is actual only for G1 collector. Humongous is an allocation when its size is greater than 50% of one region size; the object then allocated in special space. Nevertheless, it also causes normal GC collection to get more (possibly continuous also) space for such objects.
CMS Initial Mark Initial mark phase of CMS, for more details, see Phases of CMS. It also triggers Young Space collection.
System.gc() There was System.gc() call in the application code. You can start JVM with -XX:+DisableExplicitGC flag to disable such behavior.
Adaptive Size Ergonomics Indicates you are using the adaptive heap size policy (ability to change Young and Tenured spaces size at runtime), enabled via -XX:+UseAdaptiveSizePolicy flag. By default it is enabled in the recent versions of JVM.
Allocation Profiler This is actual only for versions of Java before 8 and only when -Xaprof is set. It triggers just before JVM exits.
Heap Inspection GC was triggered by an inspection operation on the heap, most probably by jmap tool with -histo:live flag set.
Heap Dump GC was initiated before heap dump is made by some profiling instrument.
No GC Normally, you shouldn’t see this reason. It was occurring in older Java versions, in case jstat command was started before any collection occurred. Other case is when jstat checks GC without any GC activity.
Last Ditch Collection When Metaspace (Java 8+) or PermGen (Java 7-) is full and you can’t allocate a new object here, JVM first tries to clean it, triggering appropriate collector. If that’s not possible, it then tries to expand it. If that doesn’t work as well, it triggers Full GC with this cause name. Soft references are being cleaned during it as well.
Perm Generation Full Triggered as a result of an allocation failure in PermGen. Actual for Java versions prior to 8.
Metadata GC Threshold Triggered as a result of an allocation failure in Metaspace. Metaspace is a replacement for PermGen in Java 8+.
JvmtiEnv ForceGarbageCollection Something called the JVM tool interface function ForceGarbageCollection.

GCPlot Beta Release

We are glad to announce that GCPlot is finally available as a beta version! While we expect Realtime Connection and Java 9 support at Q1 2017, you can already start processing your GC log files at full scale.

Our purpose is to provide a universal and most comprehensive tool for JVM GC monitoring and verbose analyzing.

Currently, GCPlot already keeps dozens of stats and graphs, so check it out - https://gcplot.com