Java 101: Trash talk, Part 1

how-to
Dec 07, 200124 mins
Core JavaDevelopment ToolsDevops

Java recycles its memory through garbage collection

Take out the trash! When it’s applied to computer languages, that command conjures up a different meaning than the one you’re used to. In Java, trash, or garbage, is heap memory that a program allocates for objects but no longer references; such memory serves no useful purpose. Just as real-world garbage clogs a trash bin, as Java’s garbage piles up, it reduces the total amount of heap memory. If the JVM does not remove that garbage, the JVM eventually runs out of heap memory and can’t fulfill future program requests to allocate memory for new objects. To prevent that from happening, the JVM takes out the trash through its use of garbage collection.

In this article, I introduce you to garbage collection, a term computer developers commonly use to refer to memory recycling — that is, the reuse of heap memory. After learning important details about garbage collection and its various algorithms, you’ll learn the practical side of garbage collection from Java’s perspective: how to ask the JVM to run the garbage collector, how to finalize objects, and how to resurrect finalized objects (and why that is a no-no!). We start exploring garbage collection by defining that term.

Read the whole series on garbage collection:

What is garbage collection?

From a source code perspective, you use the keyword new to create an object. After you compile that source code into a classfile, the JVM eventually encounters an equivalent byte code instruction to create that object and initialize the object’s instance fields to default values. If the object is a nonarray object, such as a single String object, the JVM executes a new byte code instruction to allocate memory for that object in the JVM’s object heap, a pool of JVM memory that stores objects. However, if the object is an array object, the JVM executes one of the newarray, anewarray, or multianewarray byte code instructions to perform the same tasks. In any case, the JVM reserves memory for the new object in its object heap. As long as that object exists, the JVM does not give that memory to some other object.

Note
Java’s garbage collection activity is JVM-specific. As a result, the output from various programs that appear in this article will not necessarily match your output. Keep that in mind when you start reading the section entitled “Run Java’s Garbage Collector.”

Each of the aforementioned byte code instructions returns a reference to the just-allocated memory chunk. That reference might be a C++-style pointer or some arbitrary value identifying that memory. What constitutes the reference depends on the JVM implementation; you do not need to know about it for the purposes of this discussion.

Typically, you assign the just-returned reference to some kind of reference variable whose type is either the same as or similar to the type of the just-created object. Alternatively, you might create a temporary object (making it temporary by not assigning its reference to any reference variable) to call one of that object’s methods. For example, you might execute the following code fragment to print a circle’s area by first creating a Circle object with (10.0, 10.0) as the center and 25.0 as the radius, and then calling Circle‘s area() method via the newly returned reference:

System.out.println ("Area = " + new Circle (10.0, 10.0, 25.0).area ());

The code fragment creates a Circle object, calls its constructor to initialize that object to a specific center and radius, returns a Circle reference, and uses that reference to call Circle‘s area() method, whose return value subsequently prints. After area() returns, the Circle‘s reference disappears from the program. That is why Circle is known as a temporary object; without the reference, you can no longer call that object’s instance methods.

What happens to an object when you lose its reference? In a language like C++, you have just tied up heap memory. Without the object’s reference, that memory remains occupied until the program exits. In Java, however, the reference becomes eligible for garbage collection — the process by which some executable, such as the JVM, automatically frees an object’s memory when a single reference to that memory no longer exists.

When a Java program runs, a portion of most JVMs known as the garbage collector runs as a background thread to perform garbage collection. Because that thread runs occasionally at nondeterministic times, you do not know when it will run. (I will cover threads in a future article.)

To carry out garbage collection, the garbage collector thread performs at least the first of the following two main tasks:

  • It frees an object’s heap memory for later use by a subsequently created object. After all, heap memory is not infinite, and a moment comes when the JVM either needs to free object memory or shut down because all available heap memory has run out.
  • It defragments the heap. As the JVM creates objects and its garbage collector frees the memory of unreferenced objects, the heap fragments. Free memory holes appear among blocks of memory assigned to objects. When the JVM attempts to create a new object, enough free memory, from the sum total of all holes, might be available to accommodate the object. However, there might not be a free memory hole large enough to hold the object’s instance fields. Defragmentation moves all occupied objects’ memory to one end of the heap. That memory serves as one large hole that the JVM can use to allocate memory for new objects.

Garbage collection algorithms

Not only does the JLS not mandate a garbage collector for a JVM, it also does not specify an algorithm that a JVM implementer must follow to implement a garbage collector. A garbage collector must only detect objects no longer needed and reclaim the heap space that those objects occupy. Over the years, computer scientists have devised many garbage collection algorithms; each offers some advantage in terms of memory use or performance.

Note
The Java Language Specification (JLS), which is the official word on the Java language, does not state that a JVM implementation must support a garbage collector. For example, a JVM implementation on a smart card — a plastic card with an embedded processor — might require all Java programs to fit within the confines of that card’s memory. As a result, the smart card JVM would not need a garbage collector. However, most JVMs need garbage collectors.

Many garbage collection algorithms identify a root set of references and determine an object’s reachability from those roots. The root set of references is a reference variable collection always accessible to an executing Java program. Each variable’s reference value allows the program to access an object’s fields and call the object’s methods. Root-set variables include local variables, parameters, and class fields. A garbage collector regards all objects reachable from root-set variables as live objects and cannot collect them as garbage. That includes objects indirectly reachable from root-set variables, meaning the objects are reachable through live objects that are directly reachable from the root-set variables. In contrast, an object that is unreachable by any path from any root-set variable is eligible for garbage collection. For example, in the earlier code fragment, because no root-set variables reference Circle after area() finishes, Circle is no longer reachable and can be garbage collected.

The following sections examine various categories of garbage collection algorithms. The first category — reference counting algorithms — distinguishes itself as the only category not requiring a root set of references to detect unreachable objects.

Reference counting algorithms

The earliest garbage collection algorithms used reference counting to distinguish live objects from objects no longer in use. Basically, each object on the heap associates with a reference count. When that object first comes into existence and its reference assigns to a variable, the reference count initializes to one. When that object’s reference assigns to any other variable, the reference count increments for each assignment. When a variable containing the object’s reference goes out of scope (e.g., a method returns, and the variable was local to that method), the reference count decrements. Once the reference count reaches zero, the object becomes eligible for garbage collection. During garbage collection, the garbage collector decrements the reference count for each object that a garbage-collected object references. If any of those objects’ reference counts reach zero, the reference counting-based garbage collector can also simultaneously collect the associated objects.

Reference counting has the advantage over other algorithm categories in that it does not interrupt program execution for lengthy periods. Because a reference counting-based garbage collector can run in short time periods that carefully coordinate with a program’s execution, it is ideally suited for those programs that must run in real time. On the flip side, there are two disadvantages to this kind of algorithm:

  1. Reference counting adds some overhead to program execution. That overhead is due to the incrementing and decrementing of a counter each time an object’s reference assigns to a new variable or each time an existing variable goes out of scope.
  2. Reference counting cannot detect cycles. For example, imagine a parent object that contains a variable that references a child object; this child object contains a variable that references the parent object. Such objects will never have a zero reference count — even when no other variable exists that can reach either object. Hence, a reference counting-based garbage collector can never collect such objects.

Tracing algorithms

To overcome the aforementioned problems with reference counting algorithms, computer scientists devised the concepts of a root set of references (discussed above) and tracing algorithms. A tracing-based garbage collector identifies all objects (in a recursive manner) reachable from root-set variables and marks those objects in some fashion, such as setting one or more bits that associate with each object. Following the marking phase, the simplest form of tracing-based collector “sweeps” through all objects and garbage collects those objects that it did not mark. This tracing-based garbage collector is known as a mark-and-sweep garbage collector.

Compacting algorithms

A mark-and-sweep garbage collector should have some means to combat heap fragmentation. One way to accomplish that task is for the mark-and-sweep garbage collector to incorporate a compacting algorithm as part of the mark-and-sweep garbage collector’s logic.

A compacting mark-and-sweep garbage collector moves all objects to one side of the heap during the sweep phase. The other end of the heap becomes one contiguous region of free memory. The collector updates all references to all objects it moves so those references identify the same objects in their new locations.

To simplify the object reference updates, some JVM garbage collector implementations that use compacting mark-and-sweep add a level of indirection. That level of indirection manifests itself through handles and a handle table. Under that scenario, an object reference always refers to the same handle entry in the handle table. In turn, the handle entry contains the object’s actual reference. When a compacting mark-and-sweep garbage collector moves an object, the collector only modifies the actual object reference in the handle entry. All references to the handle entry remain undisturbed. Although the use of a handle table simplifies heap fragmentation, accessing each object adds the disadvantage of additional overhead, which can slow program execution.

Copying algorithms

To overcome the handles overhead, computer scientists invented copying algorithms. A copying-based garbage collector is a tracing-based garbage collector that combats heap fragmentation as follows: The heap initially divides into an object area and one or more free space areas. A program allocates memory for objects from the object area and leaves the free space areas alone. Once the object area is full, a copying-based garbage collector traces all live objects from root-set variables and copies each live object to a free space area. In the free space area, the collector positions each live object immediately next to another live object so no holes exist between objects. The program now allocates memory for objects from the chosen free space area, which is now known as the object area, and leaves the old object area, now a free space area, alone. The next time the copying-based garbage collector runs, it copies live objects from that object area to a new free space area. This back-and-forth copying cycle between object and free space areas continues each time the copying-based garbage collector runs.

One of the more common copying-based garbage collectors is known as the stop-and-copy garbage collector. That collector divides the heap into an object area and one free space area and stops program execution until it finishes copying all objects from the object area to the free space area.

Generational algorithms

A disadvantage of stop-and-copy garbage collectors is that the collector must copy all live objects, which increases the time a program must wait until it can resume execution. Fortunately, you can reduce the wait time because computer scientists have discovered the following:

  • Most program objects exist for short time periods
  • Most programs create few objects that exist for long time periods — and copying long-lived objects over and over accounts for most of a copying collector’s inefficiency

Generational algorithms overcome the inefficiency of repeatedly copying long-lived objects as follows: They divide the heap into two or more subheaps, where each subheap serves a generation of objects. Garbage collection takes place most frequently on the subheap that represents the youngest generation. Because most program objects exist for short time periods, the program will eventually not reference those objects and the garbage collector will collect them from the youngest subheap. After the generational garbage collector runs a few times, those objects that survived the previous runs move to the next highest generation subheap. Each progressively older generation subheap gets garbage-collected less often. And that saves time.

Note
Sun’s HotSpot JVM uses a modified form of a generational collector. That modified collector manages the mature object space through the Train algorithm first proposed by Richard Hudson and Eliot Moss. I leave it as an exercise for you to research that algorithm. Check out Resources to get started.

Adaptive algorithms

Some garbage collection algorithms are better than others for certain heap situations. An adaptive algorithm monitors the current heap situation and selects an appropriate garbage collector on a situation-by-situation basis. Alternatively, an adaptive algorithm might divide the object heap into subheaps and assign a different garbage collector to each subheap. Each garbage collector might run simultaneously with the other collectors.

That’s enough theory for now. It’s time to start exploring the practicalities of garbage collection from Java’s perspective. To begin, we investigate how to run Java’s garbage collector.

Run Java’s garbage collector

Regardless of the garbage collection algorithm(s) that a JVM uses, you can always request that the JVM perform garbage collection by executing System.gc ();. According to the SDK documentation, System‘s gc() method:

… suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects.

And why might you want to explicitly ask the JVM to run the garbage collector? Suppose you need to execute a long-running series of calculations that require a large amount of heap memory. Having the garbage collector clear out unreachable objects before those calculations begin yields more free heap memory for those calculations. The result: they complete much more quickly.

Note
System‘s gc() method calls Runtime.getRuntime ().gc ();.

Finalize objects

Before a garbage collector can collect a Java object, it must finalize that object, meaning it must allow the object to release any acquired finite resources. Finite resources are those resources in short supply, such as file and socket handles, graphics contexts, or a sound card. Think of finalizing objects as giving an object a second chance to release finite resources. And when does the object receive its first chance to release those resources? For an answer to that question, consider the following three finite resource acquisition and release scenarios:

  1. You can acquire and release a resource in the same method. Although that guarantees that resource’s release, a time cost results when acquiring the resource. Because you must acquire the resource each time you call the method, that time cost can turn into a significant performance penalty.
  2. You can acquire a resource in one method and free that resource in another. Although this solves the performance problem (because you do not need to repeatedly acquire the resource), you might forget to call the method that acquires the resource but call the method that releases the resource anyway. That forgetfulness can lead to subtle bugs in your code, and those bugs might be hard to find.
  3. You can acquire the resource in a constructor and release that resource in another method. Although this guarantees that the resource is present before you begin to work with the object, you can acquire the resource only once during the object’s life. Despite that limitation, Java’s standard class library is replete with examples that acquire finite resources in constructors and provide release mechanisms via other methods. For example, the FileInputStream class acquires a file handle in either of its constructors and releases that file handle through its close() method.

The third approach to resource acquisition and release seems ideal. However, you might forget to call a release method or an exception might prevent a call to a release method from occurring. (I explore exceptions in an upcoming article.) Either way, your code fails to release a previously acquired resource, and that failure leads to a resource leak. For example, you probably will not be able to reopen a file that is already open. Although you should make every effort to ensure that your code always calls appropriate release methods, in the event that you fail to explicitly release those resources, Java provides a fallback mechanism (the object’s second chance), known as finalization, to help ensure the release of finite resources.

Finalization involves the garbage collector running an object’s finalize() method prior to that object’s collection. Within the finalize() method, the object can release all acquired finite resources. Recall from “Object-Oriented Language Basics, Part 5” that finalize() is one of Object‘s 11 methods and features the following signature:

protected void finalize() throws Throwable

The finalize() method does not return anything; why should it? After all, the object should cease to exist after finalize() returns. As a result, finalize()‘s signature specifies the void keyword to mark its return type. Notice the throws Throwable clause that attaches to the signature. That clause signifies that code within finalize() can throw any kind of exception. However, if that code should throw an exception, the garbage collector will ignore it.

For your first example of a program that overrides finalize(), check out Listing 1:

Listing 1. FinalizeDemo1.java

// FinalizeDemo1.java
class FinalizeDemo1
{
   public static void main (String [] args)
   {
      new FinalizeDemo1 ();
   }
   public void finalize () throws Throwable
   {
      System.out.println ("finalize() called");
      super.finalize ();
   }
}

FinalizeDemo1 is a simplistic application. Its source code includes a finalize() method that prints a message and, when run, calls super.finalize (); to allow its superclass to release any acquired finite resources. That is necessary because the garbage collector does not automatically call any superclass’s finalize() method in a class hierarchy. Because FinalizeDemo1 inherits from Object, and because Object allocates no finite resources, you do not need to include super.finalize (); in FinalizeDemo1‘s finalize() method. However, you should end your finalize() methods with calls to super.finalize ();. During the class hierarchy design, you might insert a superclass — one that declares its own finalize() method — above the class that contains finalize(). (Listing 1 requires super.finalize (); because of the throws Throwable clause.)

Tip
Get into the habit of ending your finalize() methods with super.finalize(); method calls. That way, if you insert a superclass above your class and if that superclass has its own finalize() method, you ensure that the garbage collector calls the superclass’s finalize() method.

Let’s look at FinalizeDemo1‘s execution. When run, FinalizeDemo1‘s main() method creates a FinalizeDemo1() object and promptly discards that object’s only reference, thus making that object unreachable. As a result, the object becomes eligible for garbage collection. However, finalize() does not run (at least on my platform) because the garbage collector only runs at intervals, and the program exits before the garbage collector has a chance to run. As a result, one unreachable FinalizeDemo1 object remains on the heap when the program exits. That behavior proves that finalize() is not Java’s equivalent of a destructor. After all, a language must guarantee that a call is made to every object’s destructor (when that destructor is present). In contrast, Java does not guarantee that a call is made to every object’s finalize() method.

How do you get the garbage collector to run? Remember System.gc ();? You can increase the possibility of the garbage collector running by inserting that method call into FinalizeDemo1‘s main() method, which is what Listing 2 accomplishes:

Listing 2. FinalizeDemo2.java

// FinalizeDemo2.java
class FinalizeDemo2
{
   public static void main (String [] args)
   {
      new FinalizeDemo2 ();
      System.gc ();
   }
   public void finalize () throws Throwable
   {
      System.out.println ("finalize() called");
      super.finalize ();
   }
}

Does the garbage collector run now? Usually when I run FinalizeDemo2 on my Windows 98 SE platform using Sun’s Java 2 Platform, Standard Edition (J2SE) SDK 1.4, I see the following output:

finalize() called

The garbage collector runs finalize(). If you do not see the output on your platform, keep in mind that, for whatever reason, your JVM might not run the garbage collector. That means you should not rely on finalize() to release finite resources. You should make every effort to explicitly release those resources; consider finalize() as only a fallback mechanism.

There is a greater probability that the garbage collector will recycle an unreachable object after calling its finalize() method in either a continuously running program — like a Web server program — or a long-running program that creates many objects. To see that for yourself, compile and run Listing 3:

Listing 3. FinalizeDemo3.java

// FinalizeDemo3.java
class FinalizeDemo3
{
   private static int globalID;
   private int localID;
   FinalizeDemo3 ()
   {
      localID = globalID++;
      System.out.println ("Initializing object #" + localID);
   }
   protected void finalize () throws Throwable
   {
      System.out.println ("Finalizing object #" + localID);
      super.finalize ();
   }
   public static void main (String [] args)
   {
      FinalizeDemo3 fd;
      for (int i = 0; i < 10000; i++)
           fd = new FinalizeDemo3 ();
   }
}

FinalizeDemo3 introduces a loop that repeatedly creates an object and assigns that object’s reference to the same object reference variable. As an object’s reference replaces a previously created object’s reference, the previously created object becomes unreachable. At various points during the program’s execution, the garbage collector runs some of the unreachable objects’ finalize() methods. It then collects those objects.

The previous paragraph implies that not all finalize() methods for all unreachable objects run before a program exits. If you strive to have the garbage collector run the finalize() methods for all unreachable objects (such as FinalizeDemo3 objects), try placing the following two method calls at the end of the main() method:

System.gc ();
System.runFinalization ();

According to the SDK documentation, following the System.gc (); method call, System.runFinalization (); :

… suggests that the Java Virtual Machine expend effort toward running the finalize methods of objects that have been found to be discarded but whose finalize methods have not yet been run. When control returns from the method call, the Java Virtual Machine has made a best effort to complete all outstanding finalizations.

Does that work? On my platform, running the modified FinalizeDemo3 shows that, for every constructor call, there is a matching finalize() call. As a result, the answer — for me, and most likely for you — is yes.

Caution
You should call System.runFinalization (); immediately after the call to System.gc ();. If you place System.runFinalization (); before System.gc (); or omit the System.gc (); call, not all finalize() methods will run. (I do not know the reason for this.)

Resurrection

You would think that after an object’s finalize() method runs, the garbage collector would always collect that object. However, this is not always true. Within a finalize() method, you can assign the dying object’s reference to a reference variable and prevent that object’s collection, an act many developers call resurrection. Listing 4 demonstrates resurrection:

Listing 4. Resurrection.java

// Resurrection.java
class Resurrection
{
   static Resurrection r;
   public static void main (String [] args)
   {
      new Resurrection ().hello ();
      System.gc ();
      r.hello ();
      r = null;
      System.gc ();
   }
   void hello ()
   {
      System.out.println ("hello");
   }
   public void finalize () throws Throwable
   {
      System.out.println ("finalize() called");
      super.finalize ();
      r = this;
   }
}

Resurrection creates a Resurrection object, calls that object’s hello() method, and promptly discards its object reference. Then it asks the JVM to run the garbage collector. On my platform, the garbage collector usually runs. However, there is no guarantee that it will run on your platform. If it fails to run, the subsequent r.hello (); method call throws a NullPointerException object because r contains null. But if the garbage collector does run, r contains a reference to the resurrected object (after finalize() runs), and the hello method executes. Subsequently, null assigns to r, and the JVM makes a second attempt to run the garbage collector. On my platform, that attempt shows that finalize() is not called a second time. Assuming that everything works, you should see the following output; if not, try running the program a few times in a row:

hello
finalize() called
hello

Developers typically use resurrection to maintain a pool of commonly used objects. However, resurrection can obscure source code and confuse what that code tries to accomplish. Therefore, instead of using resurrection, you should examine other techniques, such as cloning and the Reference API. We’ll explore the Reference API in Part 2 of this series.

Caution
If you resurrect an object and then make that object unreachable, the next time the garbage collector runs, it will collect that object without calling the object’s finalize() method.

Review

As a Java program runs, unreferenced objects accumulate on the object heap. The JVM’s garbage collector automatically recycles unreferenced objects’ memory each time it runs. Java’s System class supplies a gc() method that you can call to ask the JVM to run the garbage collector. Your code should typically call that method before entering a long-running sequence of calculations that results in the allocation of much heap memory.

Many objects have overridden finalize() methods that the garbage collector calls prior to collecting those objects. Code within that method typically releases any finite resources held by the object. However, an object can also resurrect itself in its finalize() method. That happens when code within finalize() assigns the current object’s reference to some root-set reference variable. Although tempting, resurrection is not a good idea because it can lead to complex code.

I strongly encourage you to review the various resources on garbage collecting as they expand on this article’s concepts. Without that additional knowledge, you might be left wondering why certain things work the way they do.

I also encourage you to email me with any questions you might have involving either this or any previous article’s material. (Please, keep such questions relevant to material discussed in this column’s articles.) Your questions and my answers will appear in the relevant study guides.

Jeff Friesen has been involved with computers for the past 20 years. He holds a degree in computer science and has worked with many computer languages. Jeff has also taught introductory Java programming at the college level. In addition to writing for JavaWorld, he has written his own Java book for beginners — Java 2 By Example, Second Edition (Que Publishing, 2001; ISBN: 0789722666) — and helped write Special Edition Using Java 2 Platform (Que Publishing, 2001; ISBN: 0789720183). Jeff goes by the nickname Java Jeff (or JavaJeff). To see what he’s working on, check out his Website at http://www.javajeff.com.