46

I have some slides from IBM named : "From Java Code to Java Heap: Understanding the Memory Usage of Your Application", that says, when we use String instead of char[], there is

Maximum overhead would be 24:1 for a single character!

but I am not able to understand what overhead is referred here. Can anybody please help?

Source :

enter image description here

11
  • 2
    Can you please add a reference to the source too? Commented Nov 20, 2013 at 12:25
  • have some slides from IBM named : From Java Code to Java Heap: Understanding the Memory Usage of Your Application, don't have URL Commented Nov 20, 2013 at 12:26
  • 3
    It's good to add this info to the question instead of the vague "somewhere". :) Commented Nov 20, 2013 at 12:29
  • for memory performance Commented Nov 20, 2013 at 12:29
  • 2
    ibm.com/developerworks/library/j-codetoheap Commented Nov 20, 2013 at 12:31

4 Answers 4

38

This figure relates to JDK 6- 32-bit.

JDK 6

In pre-Java-7 world strings which were implemented as a pointer to a region of a char[] array:

// "8 (4)" reads "8 bytes for x64, 4 bytes for x32"

class String{      //8 (4) house keeping + 8 (4) class pointer
    char[] buf;    //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
    int offset;    //4 bytes                     -> three int
    int length;    //4 bytes                     -> fields align to
    int hash;      //4 bytes                     -> 16 (12) bytes
}

So I counted:

36 bytes per new String("a") for JDK 6 x32  <-- the overhead from the article
56 bytes per new String("a") for JDK 6 x64.


JDK 7

Just to compare, in JDK 7+ String is a class which holds a char[] buffer and a hash field only.

class String{      //8 (4) + 8 (4) bytes             -> 16 (8)  aligned
    char[] buf;    //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
    int hash;      //4 bytes                         -> 8  (4)  aligned
}

So it's:

28 bytes per String for JDK 7 x32 
48 bytes per String for JDK 7 x64.

UPDATE

For 3.75:1 ratio see @Andrey's explanation below. This proportion falls down to 1 as the length of the string grows.

Useful links:

Sign up to request clarification or add additional context in comments.

14 Comments

I see what's happening now. Perhaps you should show this is in the answer a little bit. I got confused so I'm sure others may. You're showing the size of a String, but not of a char[1]. Both are sort of necessary to show the ratio
@Darkhogg It's been dead and gone since Java 7 Update 6.
@Darkhogg There was something on the mailing lists; the point is it caused more damage than good.
@Darkhogg Yes, tough luck, it hurts some use cases. On the other hand, it is more transparent and predictable and more space-efficient for small strings, which means for 99% of all strings used in Java programs. The net effect is probably less heap usage.
This rationale for the change is briefly described at mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/…
|
9

In the JVM, a character variable is stored in a single 16-bit memory allocation and changes to that Java variable overwrite that same memory location.This makes creating or updating character variables very fast and memory-cheap, but increases the JVM's overhead compared to the static allocation as used in Strings.

The JVM stores Java Strings in a variable size memory space (essentially, an array), which is exactly the same size (plus 1, for the string termination character) of the string when the String object is created or first assigned a value. Thus, an object with initial value "HELP!" would be allocated 96 bits of storage ( 6 characters, each 16-bits in size). This value is considered immutable, allowing the JVM to inline references to that variable, making static string assignments very fast, and very compact, plus very efficient from the JVM point of view.

Reference

4 Comments

I don't really think the JVM needs the terminating char though
@ratchetfreak Note that if you have the null terminator you can easily, under the hood of the JVM, use some C library's functions to operate on the strings. At least, this was one reason why Python implements strings with a string length field and null terminator. Might be the same reason for Java. In general sometimes it's convenient to have some redundancy.
That's not much of a reference. char[] doesn't store the zero terminator. Python is another story, it's much more C-oriented.
@MarkoTopolnik it may be that when you allocate a char[n] the jvm will allocate an array with an extra spot for the null terminator, but that is an implementation detail
3

I'll try explaining the numbers referenced in the source article.

The article describes object metadata typically consisting of: class, flags and lock.

The class and lock are stored in the object header and take 8 bytes on 32bit VM. I haven't found though any information about JVM implementations which has flags info in the object header. It might be so that this is stored somewhere externally (e.g. by garbage collector to count references to the object etc.).

So let's assume that the article talks about some x32 AbstractJVM which uses 12 bytes of memory to store meta information about the object.

Then for char[] we have:

  • 12 bytes of meta information (8 bytes on x32 JDK 6, 16 bytes on x64 JDK)
  • 4 bytes for array size
  • 2 bytes for each character stored
  • 2 bytes of alignment if characters number is odd (on x64 JDK: 2 * (4 - (length + 2) % 4))

For java.lang.String we have:

  • 12 bytes of meta information (8 bytes on x32 JDK6, 16 bytes on x64 JDK6)
  • 16 bytes for String fields (it is so for JDK6, 8 bytes for JDK7)
  • memory needed to store char[] as described above

So, let's count how much memory is needed to store "MyString" as String object:

12 + 16 + (12 + 4 + 2 * "MyString".length + 2 * ("MyString".length % 2)) = 60 bytes.

From other side we know that to store only the data (without information about the data type, length or anything else) we need:

2 * "MyString".length = 16 bytes

Overhead is 60 / 16 = 3.75

Similarly for single character array we get the 'maximum overhead':

12 + 16 + (12 + 4 + 2 * "a".length + 2 * ("a".length % 2)) = 48 bytes
2 * "a".length = 2 bytes
48 / 2 = 24

Following the article authors' logic ultimately the maximum overhead of value infinity is achieved when we store an empty string :).

Comments

1

I had read from old stackoverflow answer not able to get it. In Oracle's JDK a String has four instance-level fields:

A character array
An integral offset
An integral character count
An integral hash value

That means that each String introduces an extra object reference (the String itself), and three integers in addition to the character array itself. (The offset and character count are there to allow sharing of the character array among String instances produced through the String#substring() methods, a design choice that some other Java library implementers have eschewed.) Beyond the extra storage cost, there's also one more level of access indirection, not to mention the bounds checking with which the String guards its character array.

If you can get away with allocating and consuming just the basic character array, there's space to be saved there. It's certainly not idiomatic to do so in Java though; judicious comments would be warranted to justify the choice, preferably with mention of evidence from having profiled the difference.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.