java.util.Random vs MersenneTwisterFast

Dec 16, 2022 00:04:40

I am using the Random package in my application. However, I feel the results that it offers "not really" random, or at least, it is not enough to make me feel it is random. After that, I found a Q&A analysis of Random package and could even bruteforce if we didn't change the seed and use the default seed.
In PHP there is a function called mt_rand using the Mersenne Twister random number generator. I want to use the same function in Java, so I have found a class called MersenneTwisterFast. However, before I proceeded to use in my application, I would like to check "assert" their very attractive:

MersenneTwisterFast is not a subclass of java.util.Random. It has the same public methods as Random does, however, and it is algorithmically identical to MersenneTwister. MersenneTwisterFast has hard-code inlined all of its methods directly, and made all of them final (well, the ones of consequence anyway). Further, these methods are not synchronized, so the same MersenneTwisterFast instance cannot be shared by multiple threads. But all this helps MersenneTwisterFast achieve well over twice the speed of MersenneTwister. java.util.Random is about 1/3 slower than MersenneTwisterFast.

According to their "assertion", if I do not misunderstand, MersenneTwisterFast is 2 times faster than MersenneTwister and 3 times faster than Random. So, I tried on two methods: nextFloat() and nextInt() with 1 million calls.
This is the JMH code I use:

And this is the result:

As far as I understand, with the benchmark mode (AverageTime) that I choose, the lower the value of the score will bring the better performance and vice versa.
I tried to compare the source code of both MersenneTwisterFast and Random, but this is quite difficult because the source code on classpath.org and the source on OpenJDK is different. But anyway, really wondering why MersenneTwisterFast is 50 times slower than Random in this benchmark?

Dec 16, 2022 00:33:34

You didn't write your tests properly. Did you work your way through the JMH samples?

Don't put loops into microbenchmarks unless it is the loop itself that you want to benchmark. You're also creating a new object and calling its constructor in each test. You're not really benchmarking the random number generator, you're benchmarking its initialization code.

You want to benchmark nextInt() and nextFloat(), so put only those two methods in the tests.

Dec 16, 2022 01:00:59

Tan Quang wrote:I feel the results that it offers "not really" random, or at least, it is not enough to make me feel it is random.

I'm not sure what you mean by this. How would you get that feeling? And what does feeling have to do with programming?

I'm also not sure how you hope to rid this feeling by using a Mersenne twister. Implementations vary wildly and there are many variations that produce PRNGs with poor quality randomness, or with certain pitfalls in their usage.

As I said in pretty much every other thread you made, you won't know if the performance of these generators will be consequential until you profile your application to figure out the hot execution paths. There is no point in trying to make your PRNG 3 times faster if it is not called that often. It's very likely that you've already spent more time trying to find a replacement for java.util.Random than you will ever save in reducing execution cycles.

Seriously, just use java.util.Random for most applications, and if it's not unpredictable enough, use java.security.SecureRandom.

Dec 16, 2022 19:55:24

Stephan van Hulst wrote:You didn't write your tests properly. Did you work your way through the JMH samples?

I checked all the JMH samples on GitHub, seemingly there was no Random sample.

Don't put loops into microbenchmarks unless it is the loop itself that you want to benchmark. You're also creating a new object and calling its constructor in each test. You're not really benchmarking the random number generator, you're benchmarking its initialization code.

The new creation will reset the seed value. If according to the source code of Random from classpath.org, both Random and MersenneTwisterFast when initialized will set seeds by System.currentTimeMillis().
I had even tried it before:

That is to initialize once, then re-set the seed value before each time to create a random number. But the result I received, the Random and MersenneTwisterFast have the same performance.

You want to benchmark nextInt() and nextFloat(), so put only those two methods in the tests.

Oh, I'm just choosing the most commonly used methods in my code. I will try other methods when I have time.

I'm not sure what you mean by this. How would you get that feeling? And what does feeling have to do with programming?

I'm also not sure how you hope to rid this feeling by using a Mersenne twister. Implementations vary wildly and there are many variations that produce PRNGs with poor quality randomness, or with certain pitfalls in their usage.

As I said in pretty much every other thread you made, you won't know if the performance of these generators will be consequential until you profile your application to figure out the hot execution paths. There is no point in trying to make your PRNG 3 times faster if it is not called that often. It's very likely that you've already spent more time trying to find a replacement for java.util.Random than you will ever save in reducing execution cycles.

This is the table that I used Random to create. Row 1 is the initial values (temporarily called A). The next row is the initial value plus the added value (temporarily called B) plus random value (temporarily called C). Like this:

I've tried running the code again a few times, however, the next 9 rows of the column (circled in red) always have the same value as in the picture if I do Random initialization once. That's why I feel it's "not really" random.

Seriously, just use java.util.Random for most applications, and if it's not unpredictable enough, use java.security.SecureRandom.

Of course, I still use SecureRandom in sensitive cases such as when creating OTP, recovery code,...

Dec 17, 2022 05:54:06

Tan Quang wrote:I checked all the JMH samples on GitHub, seemingly there was no Random sample.

I see that most of your code follows the examples set in the samples, except you missed one important rule from sample 11: Don't put loops in your benchmark.

The new creation will reset the seed value.

...

That is to initialize once, then re-set the seed value before each time to create a random number.

It's fine to initialize a new instance before every iteration, but that initialization must happen outside of the benchmark. Otherwise, you're including the initialization time in your results, which could be orders of magnitude greater than the time needed to generate a single random number.

Always keep in mind what exactly it is you are benchmarking. Right now you're benchmarking the very first number generated by a generator. Will the generator have the same performance characteristics when generating the millionth number? You won't know using your current code. I'm not sure where I've read it, or even whether I've read it correctly, but I seem to recall that Mersenne twisters can sometimes fall into a hole and suddenly do worse for a range of numbers. I don't think it really matters though, because I can not imagine that the performance of any particular kind of random number generator will really affect your application all that much.

You want to benchmark nextInt() and nextFloat(), so put only those two methods in the tests.

Oh, I'm just choosing the most commonly used methods in my code. I will try other methods when I have time.

You misunderstood me. I wasn't telling you which methods to test, I was telling you that if you choose a method to test (for example, nextInt()), then include only that method call in your benchmark, and no loops or setup code or constructor calls.

I've tried running the code again a few times, however, the next 9 rows of the column (circled in red) always have the same value as in the picture if I do Random initialization once. That's why I feel it's "not really" random.

Please share the code you're using to generate the table. I have a feeling that you're doing something wrong.

Dec 18, 2022 22:49:44

Stephan van Hulst wrote:I see that most of your code follows the examples set in the samples, except you missed one important rule from sample 11: Don't put loop's in your benchmark.

It seems like you, or I misunderstood what JMHSample_11_Loops wanted to communicate.
As far as I understand it, this whole code is Caliper's loop benchmark code and they are trying to point out the bad thing that Caliper taught everyone (look from line 48).
Specifically (as I understand it) "Caliper taught everyone":

Create a loop execution method with the parameter of the number of repetitions (here it is reps).

Perform benchmarking by calling the above method with an increasing number of repetitions (from line 90 to 124).

After that, they made the conclusion (in line 129):

You might notice the larger the repetitions count, the lower the "perceived" cost of the operation being measured. Up to the point we do each addition with 1/20 ns, well beyond what hardware can actually do.

Of course, they also do not forget to explain the reason why the larger the repetitions count, the lower the "perceived" cost of the operation being measured (in the 53 and 54). They even names the "bad" benchmarks that are measureWrong_.
I have seen a lot of benchmarking codes containing loops, in common is that they always write a loop directly inside the benchmark measurement method instead of creating a method of running loop and calling (like "the bad thing Caliper taught everyone").

It's fine to initialize a new instance before every iteration, but that initialization must happen outside of the benchmark. Otherwise, you're including the initialization time in your results, which could be orders of magnitude greater than the time needed to generate a single random number.

Always keep in mind what exactly it is you are benchmarking. Right now you're benchmarking the very first number generated by a generator. Will the generator have the same performance characteristics when generating the millionth number? You won't know using your current code. I'm not sure where I've read it, or even whether I've read it correctly, but I seem to recall that Mersenne twisters can sometimes fall into a hole and suddenly do worse for a range of numbers. I don't think it really matters though, because I can not imagine that the performance of any particular kind of random number generator will really affect your application all that much.

You misunderstood me. I wasn't telling you which methods to test, I was telling you that if you choose a method to test (for example, nextInt()), then include only that method call in your benchmark, and no loops or setup code or constructor calls.

Well, I figured out what caused this slowness. That's because the "seeds" of Random and MersenneTwister(Fast) are different. Random uses seed as an long field (with source code on classpath.org) or AtomicLong (with source code on OpenJDK) while MersenneTwister(Fast) is an integer array (with default length is N = 624). This results in Random's setSeed method being much faster than MersenneTwister(Fast).
I tried the above code when I removed setSeed. Unexpectedly, MersenneTwisterFast was twice as fast as Random (at least in my testing - hope you can test yourself).
Interestingly, MersenneTwisterFast has a nextFloat method that can return a value of [0.0 – 1.0]. However, I would never get the above two values if I changed the seed via the setSeed method or reinitialized MersenneTwisterFast after each random number generation (as in the performance measurement code I posted).
It can be said that Random is like "one-time initialization, one-time use" (in case Random's source code is actually always using the default seed when initializing) while MersenneTwisterFast is the opposite, "one-time initialization, multi-use" (or if you prefer "performance penalty" as in my code).

Please share the code you're using to generate the table. I have a feeling that you're doing something wrong.

The table is created with Excel, only the data is created in Java and pasted into the table.

Dec 19, 2022 00:46:44

Tan Quang wrote:As far as I understand it, this whole code is Caliper's loop benchmark code and they are trying to point out the bad thing that Caliper taught everyone (look from line 48).

Exactly. So why are you doing the bad thing that Caliper taught, and not the good thing that JMH wants you to do?

I have seen a lot of benchmarking codes containing loops, in common is that they always write a loop directly inside the benchmark measurement method instead of creating a method of running loop and calling (like "the bad thing Caliper taught everyone").

You mean, you've seen a lot of benchmarking code specifically written for JMH that contains loops? In that case, most of those benchmarks are likely written wrongly. There are some legitimate reasons to use loops inside a benchmark, but they're rare. In your specific case, using loops is very wrong. Just write the way that the JMH samples tell you to, which is simply like this:

Well, I figured out what caused this slowness. That's because the "seeds" of Random and MersenneTwister(Fast) are different. Random uses seed as an long field (with source code on classpath.org) or AtomicLong (with source code on OpenJDK) while MersenneTwister(Fast) is an integer array (with default length is N = 624). This results in Random's setSeed method being much faster than MersenneTwister(Fast).
I tried the above code when I removed setSeed. Unexpectedly, MersenneTwisterFast was twice as fast as Random (at least in my testing - hope you can test yourself).

Why is this unexpected? I've been telling you this the entire time. The time used to initialize the random number generator dwarves the execution time of generating a single random number using nextInt() or nextFloat(). Setting the seed should NOT be considered part of the benchmark, because according to your first post you set out to test the assertion that MersenneTwisterFast is about three times faster than Random, which refers to generating a random number, and not to initializing the seed.

It can be said that Random is like "one-time initialization, one-time use" (in case Random's source code is actually always using the default seed when initializing) while MersenneTwisterFast is the opposite, "one-time initialization, multi-use" (or if you prefer "performance penalty" as in my code).

What are you talking about? ALL PRNGs operate on the basis of initializing the seed once, then generate a random number as often as you need. Why would you ever create a new Random instance for every number that you want to generate?

The table is created with Excel, only the data is created in Java and pasted into the table.

So obviously, I'm interested in the code that is used to generate the data.

Dec 22, 2022 21:53:51

Stephan van Hulst wrote:You mean, you've seen a lot of benchmarking code specifically written for JMH that contains loops? In that case, most of those benchmarks are likely written wrongly. There are some legitimate reasons to use loops inside a benchmark, but they're rare. In your specific case, using loops is very wrong. Just write the way that the JMH samples tell you to, which is simply like this:

If you only make a random number, can it be seen that the difference in performance?

Why is this unexpected? I've been telling you this the entire time. The time used to initialize the random number generator dwarves the execution time of generating a single random number using nextInt() or nextFloat(). Setting the seed should NOT be considered part of the benchmark, because according to your first post you set out to test the assertion that MersenneTwisterFast is about three times faster than Random, which refers to generating a random number, and not to initializing the seed.

For me, the regular change of seed is essential to avoid the case of determining the next result (as in Q&A analysis above). I want to measure the benchmark in the case of constant seed changes after each random number instead of measuring the benchmark of random numbers.
In short, it is true that I should not put the seeds to measure the benchmark as you say, at least in the case of the seed of both different.

What are you talking about? ALL PRNGs operate on the basis of initializing the seed once, then generate a random number as often as you need. Why would you ever create a new Random instance for every number that you want to generate?

Creating a new Random cost is not too different from setSeed. Furthermore, I plan to use System.identityHashCode as part of the seed, initializing a new Random will also return a new identity hashcode.

So obviously, I'm interested in the code that is used to generate the data.

Oh, I changed the code quite a bit to complete the table. But I remember that I used Random, and after each random number generation, I setSeed(System.currentTimeMillis()), which is pretty much the same as the benchmark code above.

Dec 23, 2022 03:10:19

Tan Quang wrote:If you only make a random number, can it be seen that the difference in performance?

I'm certain that it would make a big difference.

For me, the regular change of seed is essential to avoid the case of determining the next result (as in Q&A analysis above).

Then you're using the wrong tool. If you want high quality randomness, you should be using neither Random nor MersenneTwisterFast, because both are highly deterministic, as all PRNGs are. Changing the seed constantly in the way that you specified will also not help you as much as you seem to think, because those methods are also more deterministic than you might realize.

Once again, if you want high quality randomness, use SecureRandom. It essentially does what you are now trying to do manually, only much better.

The premise of all of this is that you NEED such a high level of randomness though. I'm not convinced. You have given no solid argument, except for a table of data for which you refuse to show how you've generated it.

I want to measure the benchmark in the case of constant seed changes after each random number instead of measuring the benchmark of random numbers.

Then don't tell us that you're testing the assertion that MersenneTwisterFast is three times as fast as Random, because that's not what you're testing.

Also know that you're not using PRNGs the way they're supposed to be used, and that you're drawing false conclusions from that.

Furthermore, I plan to use System.identityHashCode as part of the seed, initializing a new Random will also return a new identity hashcode.

That method is not a good source of entropy. If you REALLY care about high quality randomness, use neither identityHashCode() nor currentTimeMillis(). Instead, use SecureRandom which collects entropy until it has enough to generate a very random number.

But once again, I'm not convinced that you actually need anything other than a Random, instantiated only once and then reused as much as you need.

Oh, I changed the code quite a bit to complete the table. But I remember that I used Random, and after each random number generation, I setSeed(System.currentTimeMillis()), which is pretty much the same as the benchmark code above.

Show us the code. I want to generate the data for myself. Until then, I will consider the table you based this topic on as faulty.

Dec 24, 2022 21:29:59

Stephan van Hulst wrote:That method is not a good source of entropy. If you REALLY care about high quality randomness, use neither identityHashCode() nor currentTimeMillis(). Instead, use SecureRandom which collects entropy until it has enough to generate a very random number.

Well, I said it was partial because I would use System.currentTimeMillis() or System.nanoTime() along with it.

Show us the code. I want to generate the data for myself. Until then, I will consider the table you based this topic on as faulty.

This is the same code that I use that I use when creating data for the table. Although it is not entirely the same, it can still recreate a part of the situation I encountered.

This is the result that I compiled and runs on onlinegdb.com.
Note: I do not use MersenneTwisterFast here.

Dec 27, 2022 01:47:27

It is as I expected. The way you generate your data is wrong, because you didn't use the PRNG properly.

Because you're resetting the seed on every call to the PRNG, it no longer functions as a PRNG but rather as a glorified hash function. While that is already a problem, it might still have worked if your seed had a high level of entropy. It doesn't. You used currentTimeMillis() and identityHashCode(), which offer almost no entropy at all, which I warned you against.

But explain something to me: If you generated a high entropy seed before every PRNG call, what is the point of using the PRNG anyway? You already have a random number, so just use it instead of seeding the PRNG with it.

In summary:

The test data is wrong because the PRNG is used wrongly.

A small set of data points is used to draw conclusions about the inner workings of the PRNG.

Somehow a question about the correctness of a PRNG turned into a question about the performance of a PRNG.

Measurements regarding the performance of the PRNG are also wrong, because the benchmarks measure the wrong thing.

To convince yourself, remove line 5 from the code that generates your test data. You will see that when you stop resetting the seed and just rely on the seed that Random sets for itself when its constructor runs, you will no longer discover a pattern in your test data.

I reiterate what I've already said in this topic multiple times: You don't need anything other than java.util.Random, initialized exactly once, relying only on the default seed.

Also, do yourself a favor and replace the statement on line 18 with this one:

Dec 30, 2022 07:11:49

Stephan van Hulst wrote:It is as I expected. The way you generate your data is wrong, because you didn't use the PRNG properly.

To convince yourself, remove line 5 from the code that generates your test data. You will see that when you stop resetting the seed and just rely on the seed that Random sets for itself when its constructor runs, you will no longer discover a pattern in your test data.

Oh, of course I tried removing setSeed earlier. If there is setSeed, the end is 5, if there is no setSeed, the end is 0.

Somehow a question about the correctness of a PRNG turned into a question about the performance of a PRNG.

As you say, do not use loops, if only Random testing with MersenneTwisterFast once is not too fair. I think I should test this way:

Oh, I think so, even though I haven't tried it yet!

Dec 30, 2022 10:45:58

Tan Quang wrote:If there is setSeed, the end is 5

Not true. The last digit is random and depends on the seed used, but since your seed has very low entropy, the last digit will often repeat itself.

if there is no setSeed, the end is 0.

Not true. If you're basing your observation on a single execution, that is not statistically significant. You should also note that with the calculations you are performing after the number has been generated, the value that the last digit can take will not be distributed evenly.

I think I should test this way

You have misunderstood the meaning of the @OperationsPerInvocation annotation. Remove it.