“And to avoid the tedious repetition of these words: is equal to: I will set as I do often in work use, a pair of parallels, or Gemowe lines of one length, thus: =, because no 2 things, can be more equal.”
— Robert Recorde, The Whetstone of Witte (1557)
public class Equality {
public Equality() {
Integer a1 = 100, a2 = 100;
Integer b1 = 200, b2 = 200;
if (a1 == a2) System.out.println("a1 == a2");
if (b1 == b2) System.out.println("b1 == b2");
}
public static void main(String[] args) {new Equality(); }
}
How does the code work?
You might have seen a similar snippet before, usually followed by a good advice to always check for equality of objects using equals() instead of ==. But what really happens inside the code?
% java Equality a1 == a2
Looks like for some reason Java doesn’t want to acknowledge that 200 is equal to 200, even though it doesn’t have any problem with 100 being equal to 100. Now what happens when we change Integers to ints?
int a1 = 100, a2 = 100;
int b1 = 200, b2 = 200;
if (a1 == a2) System.out.println("a1 == a2");
if (b1 == b2) System.out.println("b1 == b2");
Surprisingly, everything works fine this time:
% java Equality2 a1 == a2 b1 == b2
Following the good advice, let’s change the code to use equals().
Integer a1 = 100, a2 = 100;
Integer b1 = 200, b2 = 200;
if (a1.equals(a2)) System.out.println("a1 equals a2");
if (b1.equals(b2)) System.out.println("b1 equals b2");
The result is also correct this time:
% java Equality3 a1 equals a2 b1 equals b2
So what happens under the hood that makes 200 not equal to 200? Let’s take a closer look at our Integers.
Integer a1 = 100, a2 = 100;
Integer b1 = 200, b2 = 200;
for (Integer i : new Integer[] { a1, a2, b1, b2 })
System.out.printf("%d -> hash %d, id hash %d%n",
i, i.hashCode(), System.identityHashCode(i));
This piece of code gives us some more insight about the Integers:
% java Equality4 100 -> hash 100, id hash 1265094477 100 -> hash 100, id hash 1265094477 200 -> hash 200, id hash 2125039532 200 -> hash 200, id hash 312714112
The first observation is that the Integers hash to themselves, but that’s pretty boring. A more interesting realization is that a1 and a2 point to the same object, while b1 and b2 are distinct.
Why is it so?
The Integer Cache and autoboxing
Introduced in Java 5, the Integer cache’s main goals are to improve Integer object performance and to reduce the memory footprint. The idea behind the mechanism is to cache a small number of Integers internally and reuse them.
Autoboxing and autounboxing, the concepts also introduced in Java 5, stand for automatic conversions between the primitive types and the corresponding object wrappers. Let’s have a quick look at how these work:
Integer a1 = 100;
With autoboxing, the compiler actually replaces that line of code with:
Integer a1 = Integer.valueOf(100);
Autounboxing works in a similar way:
Integer a1 = new Integer(100); int p = a1; // actually does this: int p = a1.intValue();
We’re just one step away from solving the mystery. Let’s have a look at Integer.valueOf() now:
Returns an
Integerinstance representing the specifiedintvalue. If a newIntegerinstance is not required, this method should generally be used in preference to the constructorInteger(int), as this method is likely to yield significantly better space and time performance by caching frequently requested values. This method will always cache values in the range -128 to 127, inclusive, and may cache other values outside of this range.
Indeed, looking under the hood we can see that the Integer has an inner private class, IntegerCache, that stores copies of Integers, by default those with values from -128 to 127 in an array. It is used by valueOf(int) to avoid the creation of new objects when unnecessary. We also see that the upper bound is configurable by the -XX:AutoBoxCacheMax=n option.
Knowing all this, let’s go back to the the code. We will print the hashcodes of the objects, then use == for checking equality:
Integer a1 = 100, a2 = 100;
Integer b1 = 200, b2 = 200;
for (Integer i : new Integer[] { a1, a2, b1, b2 })
System.out.printf("%d -> hash %d, id hash %d%n",
i, i.hashCode(), System.identityHashCode(i));
if (a1 == a2) System.out.println("a1 == a2");
if (b1 == b2) System.out.println("b1 == b2");
Time to run the code again, passing the appropriate parameter to the VM:
% java -XX:AutoBoxCacheMax=200 Equality5 100 -> hash 100, id hash 1252169911 100 -> hash 100, id hash 1252169911 200 -> hash 200, id hash 2101973421 200 -> hash 200, id hash 2101973421 a1 == a2 b1 == b2
As expected, because we bumped the upper bound of the integer cache to 200, both b1 and b2 are now served from the cache, making the code produce expected results.
The fun part: 2 + 2 = 5
By adding reflections to the mix, we can access and modify the integer cache from the code, making Java do unexpected things:
import java.lang.reflect.Field;
public class IntegerCacheFun {
public static void main(String[] args) throws Exception {
Class cls = Class.forName("java.lang.Integer$IntegerCache");
Field fld = cls.getDeclaredField("cache");
fld.setAccessible(true);
Integer[] cache = (Integer[]) fld.get(cls);
cache[4 + 128] = 5;
Integer result = 2 + 2;
System.out.print("2 + 2 = ");
System.out.println(result);
}
}
After making such modification to the integer cache, Java will claim that:
% java IntegerCacheFun 2 + 2 = 5
An exercise for the reader
Which other Java classes can be abused in such way to produce wrong results?