Design patterns in the real world: Flyweight

Quite some software engineers think that design patterns are some overly complicated, mythical, abstract things that bring no practical value to software development. This is unfortunate. In order to prove they are indeed something real, in this (and some upcoming) post(s) we are going to take a look on a few examples on how real software products implement some of the GoF design patterns. The first one to be examined is Flyweight.

Flyweight defined

According to Wikipedia, Flyweight is defined as follows:

A flyweight is an object that minimizes memory use by sharing as much data as possible with other similar objects; it is a way to use objects in large numbers when a simple repeated representation would use an unacceptable amount of memory.

Reading carefully the definition above, one can see the obvious similarities with what we call a “cache” in software engineering. As such, two important aspects should be considered:

  • implementations of this design pattern may lead to garbage collection unfriendly solutions, as retained, shared objects may be ineligible for garbage collection.
  • not stated explicitly, but it makes sense to define those shared objects as stateless/immutable. This way we can overcome some evident problems like data race conditions and objects with illegal state.

Now, let’s take a look on HotSpot’s Flyweight implementations.

 The Integer class

Storing commonly used (immutable) objects for sharing between different clients? Sounds just like what Integer does internally. Of course, it does not really make sense to cache every possible value, since Integer’s range is pretty large. The idea is to cache the most commonly used values, so the one eager object creation outperforms repeated identical object creations. Those values, the ones used the most frequently, happen to be the byte ranged numbers. The following code snipped demonstrates this behavior:

    for (int i = -129; i <= 128; i++) {
        Integer i1 = i;
        Integer i2 = i;

        System.out.println(i1 + " == " + i2 + " : " + (i1 == i2));
    }

The output of this loop is the following:

-129 == -129 : false
-128 == -128 : true
-127 == -127 : true
...
126 == 126 : true
127 == 127 : true
128 == 128 : false

As we can see, for the same int values we get back the very same object instance (that’s why the identity operator (==) returns true) in the byte range. If you take a look on Integer’s source code, you can see that it defines a private static class, called IntegerCache; this is the place where the “magic” happens, the one responsible for pre-creating Integer objects. In this aspect, this Flyweight implementation is a bit different from the original one; there the client explicitly turns to a FlyweightFactory in order to receive a cached instance, while in the case of Integer, the factory is hidden, as shown below (click to enlarge):

flyweight (1)

But hey, what if we are extensively using integer values up to say, 255? Is there a way we can tell the JVM to cache those values for us? Turns out the answer is yes. The only thing we have to do is start our JVM with any of the following arguments: -Djava.lang.Integer.IntegerCache.high=255, or -XX:AutoBoxCacheMax=255 (prefer the latter one). Now, if we run the test application again, with slightly modified values, we get the following results:

-129 == -129 : false
-128 == -128 : true
...
127 == 127 : true
128 == 128 : true
...
255 == 255 : true
256 == 256 : false

Yaaaay, we’ve just created a larger cache for Integers. Aha! So maybe if we don’t want the JVM to apply all these smarts on our behalf, all we have to do is set -XX:AutoBoxCacheMax to -128 and the whole caching mechanism goes away? It turns out Java does not behave that way. The Java Language Specification requires that integers between -128 and 127 be always cached. If we set the upper bound to a number less than 127, that setting is silently ignored. Another thing to note is that there is no way to configure the lower bound of the cache, it will always stay -128.

Usually there is no reason to resize the integer pool. However, if you -for some strange reason – need to modify the upper bound, caution should be taken. Technically it is possible to set it to a very large number, but that:

  1. will result in more delay in JVM startup than speedup gained throughout the application’s lifecycle
  2. can lead to JVM initialization error, especially if your heap size is not large enough. This is what happened to my JVM with an upper bound of 200 million:

Error occurred during initialization of VM
GC triggered before VM initialization completed. Try increasing NewSize, current value 85M.

One more important thing to keep in mind is that caching can only kick in when we let JVM do the object creation. In the example above, we leveraged Java’s autoboxing capability to wrap a primitive int in an Integer object. Let’s now modify this application a bit, so we use the new operator for creating the objects:

    for (int i = -129; i <= 128; i++) {
        Integer i1 = new Integer(i);
        Integer i2 = new Integer(i);

        System.out.println(i1 + " == " + i2 + " : " + (i1 == i2));
    }

Now, we get an entirely different result:

-129 == -129 : false
-128 == -128 : false
-127 == -127 : false
...
126 == 126 : false
127 == 127 : false
128 == 128 : false

As we’ve taken out object creation from the VM’s hands and made it explicit (using the new Integer(int) constructor), we’ve definitively asked the virtual machine to create brand new Integer objects on every single occasion. Even though there were perfectly fine instances in the cache, we’ve forbidden the use of them. Autoboxing and static factory methods (valueOf(), decode(), parseInt()) are able to use the cached values, but instances created using new will always result in newly created, distinct objects.

The Character, Byte, Short and Long classes

These classes are very similar to Integer, from Flyweight perspective. They also contain private static classes called CharacterCache, ByteCache ShortCache, and LongCache respectively. A small difference in implementation is that Character caches the first 128 (from 0 to 127) characters, while Byte, Short and Long cache the very same range as Integer does (from -128 to 127).

These four classes offer no way of modifying the cache boundaries, predefined values cannot be overwritten. In order to use the cached values, object creation rules mentioned in the previous section apply.

The String class

The possible combinations for Strings is close to infinite, so there is no point in eagerly caching anything at startup like in case of the previous classes. However, the virtual machine still maintains a cache for strings, called the string constant pool. It is empty at startup and is filled constantly during the lifecycle of the JVM.

The key to string pooling – just like in the previous cases – is object creation. Let’s create a small test:

String s1 = "hello world";
String s2 = "hello world";
String s3 = new String("hello world");
		
System.out.println(s1 == s2);
System.out.println(s1 == s3);

This code prints true and false, just as we expect. Exactly like in the previous cases, we have to watch object construction closely. The first string (s1) is created as a string literal, similarly to the second one (s2). s1 is stored in the cache once created, so when s2 is constructed, it can be resolved from the cache. The two variables will share the very same String instance from the pool (note that this is only possible because Strings are immutable, so it is safe to be shared between two different variables). The third string (s3) is created using the new keyword, so it will be created as a new String instance, and will not be served from the cache.

String objects created using the new keyword can still be made eligible for pooling. String defines a native method called intern() that does just that; it searches for a specific string in the pool and returns the cached version of it if found, otherwise stores the string into the pool and returns it. Let’s see how it works:

String s3 = new String("hello world");
s3 = s3.intern();

String s1 = "hello world";
String s2 = "hello world";
		
System.out.println(s1 == s2);
System.out.println(s1 == s3);

This time around, the result is true and true.

As mentioned earlier, intern() is declared native. This is because the string pool is handled outside of the Java code, and it has a C++ implementation (string instances are stored in an object of type StringTable). As such, there is really no way to access it directly from Java code.

Final thoughts

A few thoughts as a recap…

Character class maintains a cache of the first 128 characters. Byte, Short, and Integer maintain a cache for all the numbers in the byte range. For Integers, upper bound can be set to a value higher than the predefined one (127) using -XX:AutoBoxCacheMax, lower bound can not be modified. It does not make sense to set an unreasonably high upper bound.

Strings created as literals end up on the String constant pool. Strings created using the new keyword can be put on the same cache calling the intern() method. The string cache has no trace in Java sources, everything is handled inside the JVM.

No matter what data type we are talking about, pooled values only come into play when explicit object creation is avoided and instantiation is left to Java.

Advertisements

Author: tamasgyorfi

Senior software engineer, certified enterprise architect and certified Scrum master. Feel free to connect on Twitter: @tamasgyorfi

1 thought on “Design patterns in the real world: Flyweight”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s