Java benchmarking - why is the second loop faster?

Question

I'm curious about this.

I wanted to check which function was faster, so I create a little code and I executed a lot of times.

public static void main(String[] args) {

        long ts;
        String c = "sgfrt34tdfg34";

        ts = System.currentTimeMillis();
        for (int k = 0; k < 10000000; k++) {
            c.getBytes();
        }
        System.out.println("t1->" + (System.currentTimeMillis() - ts));

        ts = System.currentTimeMillis();
        for (int i = 0; i < 10000000; i++) {
            Bytes.toBytes(c);
        }
        System.out.println("t2->" + (System.currentTimeMillis() - ts));

    }

The "second" loop is faster, so, I thought that Bytes class from hadoop was faster than the function from String class. Then, I changed the order of the loops and then c.getBytes() got faster. I executed many times, and my conclusion was, I don't know why, but something happen in my VM after the first code execute so that the results become faster for the second loop.

Can you please put details like JDK version, OS and sample benchmark numbers ? — Santosh
– Santosh, Commented Dec 18, 2013 at 13:31
major problem between your title and your explanation which loop is faster ? — Kiwy
– Kiwy, Commented Dec 18, 2013 at 14:37
@guille. I just encountered the same behavior but I noticed it when I start it in debug mode and not in release. Did you made your test in debug? — fralbo
– fralbo, Commented Jun 22, 2018 at 8:03
See Aleksey Shipilev's great answer to a duplicate question for an in-depth analysis of the performance of code like this. — Lii
– Lii, Commented Jun 30, 2018 at 15:52

Tim B · Accepted Answer · 2013-12-19 15:22:36Z

63

This is a classic java benchmarking issue. Hotspot/JIT/etc will compile your code as you use it, so it gets faster during the run.

Run around the loop at least 3000 times (10000 on a server or on 64 bit) first - then do your measurements.

edited Dec 19, 2013 at 15:22

answered Dec 18, 2013 at 10:45

Tim B

41.3k16 gold badges87 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

gyorgyabraham Over a year ago

@Santosh The option you are looking for is -XX:CompileThreshold Never set this to 0 or 1. This is the number of method calls after a method is eligible for JIT. By default it is 3000 and -server option overrides it to 10000

gyorgyabraham Over a year ago

From the Oracle docs it seems like the default client value is 1500, maybe 3000 was revised in some new version :) Also afaik when you use 64bit JRE, the server option is implicit default

Sergey Kalinichenko · Accepted Answer · 2013-12-18 10:57:42Z

You know there's something wrong, because Bytes.toBytes calls c.getBytes internally:

public static byte[] toBytes(String s) {
    try {
        return s.getBytes(HConstants.UTF8_ENCODING);
    } catch (UnsupportedEncodingException e) {
        LOG.error("UTF-8 not supported?", e);
        return null;
    }
}

The source is taken from here. This tells you that the call cannot possibly be faster than the direct call - at the very best (i.e. if it gets inlined) it would have the same timing. Generally, though, you'd expect it to be a little slower, because of the small overhead in calling a function.

This is the classic problem with micro-benchmarking in interpreted, garbage-collected environments with components that run at arbitrary time, such as garbage collectors. On top of that, there are hardware optimizations, such as caching, that skew the picture. As the result, the best way to see what is going on is often to look at the source.

Bytes.toBytes() calls c.getBytes("UFT-8") internally but the one with encoding as input.Strings getByte() has few more overheads than getByte("UTF-8") and hence slower. In short Bytes.toBytes() is faster.

Peter Lawrey · Accepted Answer · 2013-12-18 11:11:05Z

The "second" loop is faster, so,

When you execute a method at least 10000 times, it triggers the whole method to be compiled. This means that your second loop can be

faster as it is already compiled the first time you run it.
slower because when optimised it doesn't have good information/counters on how the code is executed.

The best solution is to place each loop in a separate method so one loop doesn't optimise the other AND run this a few times, ignoring the first run.

e.g.

for(int i = 0; i < 3; i++) {
    long time1 = doTest1();  // timed using System.nanoTime();
    long time2 = doTest2();
    System.out.printf("Test1 took %,d on average, Test2 took %,d on average%n",
        time1/RUNS, time2/RUNS);
}

This doesn't address the fact that the second loop is always faster, regardless of which code is in it.
@OrangeDog Hmmm, that is exactly what I address and how to fix it. I don't know how to make it any clearer.

Boann · Accepted Answer · 2013-12-18 10:47:08Z

6

Most likely, the code was still compiling or not yet compiled at the time the first loop ran.

Wrap the entire method in an outer loop so you can run the benchmarks a few times, and you should see more stable results.

Read: Dynamic compilation and performance measurement.

answered Dec 18, 2013 at 10:47

Boann

50.3k16 gold badges125 silver badges153 bronze badges

Comments

jcklie · Accepted Answer · 2013-12-18 10:47:18Z

5

It simply might be the case that you allocate so much space for objects with your calls to getBytes(), that the JVM Garbage Collector starts and cleans up the unused references (bringing out the trash).

answered Dec 18, 2013 at 10:47

jcklie

4,0943 gold badges26 silver badges44 bronze badges

Comments

Santosh · Accepted Answer · 2013-12-18 13:49:53Z

Few more observations

As pointed by @dasblinkenlight above, Hadoop's Bytes.toBytes(c); internally calls the String.getBytes("UTF-8")

The variant method String.getBytes() which takes Character Set as input is faster than the one that does not take any character set. So for a given string, getBytes("UTF-8") would be faster than getBytes(). I have tested this on my machine (Windows8, JDK 7). Run the two loops one with getBytes("UTF-8") and other with getBytes() in sequence in equal iterations.

    long ts;
    String c = "sgfrt34tdfg34";

    ts = System.currentTimeMillis();
    for (int k = 0; k < 10000000; k++) {
        c.getBytes("UTF-8");
    }
    System.out.println("t1->" + (System.currentTimeMillis() - ts));

    ts = System.currentTimeMillis();
    for (int i = 0; i < 10000000; i++) { 
        c.getBytes();
    }
    System.out.println("t2->" + (System.currentTimeMillis() - ts));

this gives:

t1->1970
t2->2541

and the results are same even if you change order of executions of loop. To discount any JIT optimizations, I would suggest run the tests in separate methods to confirm this (as suggested by @Peter Lawrey above)

So, Bytes.toBytes(c) should always be faster than String.getBytes()

Thank you for all your comments, I already know a little bit more.

Collectives™ on Stack Overflow

Java benchmarking - why is the second loop faster?

6 Answers 6

2 Comments

1 Comment

2 Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

1 Comment

2 Comments

Comments

Comments

1 Comment

Linked

Related