I just tried a test example on codility. The task was: " ... given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.".
Plus:
- N is an integer within the range [1..100,000]; 
- each element of array A is an integer within the range [−1,000,000..1,000,000]. 
my firts attempt was a typical Java 8 solution:
public int solution(int[] A) {
     Set<Integer> l = Arrays
            .stream(A)
            .boxed()
            .filter(i -> i > 0)
            .collect(Collectors.toSet());
    return IntStream
            .iterate(1, a -> a + 1)
            .filter(i -> !l.contains(i))
            .findFirst()
            .getAsInt();
}
All correct, but the tests for intermediate size test arrays, ran into a timeout.
Second try (plain old java):
public int solution(int[] A) {
    boolean[] B = new boolean[1000001];
    for (int a : A) {
        if (a > 0) {
            B[a] = true;
        }
    }
    for (int i = 1; i <= 1000000; i++) {
        if (!B[i]) {
            return i;
        }
    }
    return 1;
}
This version was incedibly much faster, especially for short arrays A.
Questions:
- Am I missing something with my first attempt?
- Are there tweaking options?
- Is the test on codility premature, and in reallity the difference is expected to disappear, almost entirely?
- Does anyone have a better Java 8 solution?
Test results first version:
▶ example1 first example test ✔OK 1. 0.108 s OK
▶ example2 second example test ✔OK 1. 0.104 s OK
▶ example3 third example test ✔OK 1. 0.104 s OK
▶ extreme_single a single element ✔OK 1. 0.100 s OK 2. 0.104 s OK 3. 0.104 s OK 4. 0.100 s OK
▶ simple simple test ✔OK 1. 0.100 s OK 2. 0.104 s OK 3. 0.100 s OK
▶ extreme_min_max_value minimal and maximal values ✔OK 1. 0.100 s OK 2. 0.104 s OK
▶ positive_only shuffled sequence of 0...100 and then 102...200 ✔OK 1. 0.100 s OK 2. 0.104 s OK
▶ negative_only shuffled sequence -100 ... -1 ✔OK 1. 0.100 s OK
▶ medium chaotic sequences length=10005 (with minus) ✘TIMEOUT ERROR running time: 0.136 sec., time limit: 0.100 sec. 1. 0.136 s TIMEOUT ERROR, running time: 0.136 sec., time limit: 0.100 sec. 2. 0.128 s TIMEOUT ERROR, running time: 0.128 sec., time limit: 0.100 sec. 3. 0.144 s TIMEOUT ERROR, running time: 0.144 sec., time limit: 0.128 sec.
▶ large_1 chaotic + sequence 1, 2, ..., 40000 (without minus) ✔OK 1. 0.588 s OK
▶ large_2 shuffled sequence 1, 2, ..., 100000 (without minus) ✔OK 1. 0.748 s OK 2. 0.660 s OK
▶ large_3 chaotic + many -1, 1, 2, 3 (with minus) ✔OK 1. 0.444 s OK
Test results second version:
▶ example1 first example test ✔OK 1. 0.004 s OK
▶ example2 second example test ✔OK 1. 0.004 s OK
▶ example3 third example test ✔OK 1. 0.004 s OK
▶ extreme_single a single element ✔OK 1. 0.004 s OK 2. 0.008 s OK 3. 0.004 s OK 4. 0.008 s OK
▶ simple simple test ✔OK 1. 0.004 s OK 2. 0.004 s OK 3. 0.008 s OK
▶ extreme_min_max_value minimal and maximal values ✔OK 1. 0.008 s OK 2. 0.004 s OK
▶ positive_only shuffled sequence of 0...100 and then 102...200 ✔OK 1. 0.008 s OK 2. 0.004 s OK
▶ negative_only shuffled sequence -100 ... -1 ✔OK 1. 0.008 s OK
▶ medium chaotic sequences length=10005 (with minus) ✔OK 1. 0.024 s OK 2. 0.024 s OK 3. 0.032 s OK
▶ large_1 chaotic + sequence 1, 2, ..., 40000 (without minus) ✔OK 1. 0.220 s OK
▶ large_2 shuffled sequence 1, 2, ..., 100000 (without minus) ✔OK 1. 0.244 s OK 2. 0.244 s OK
▶ large_3 chaotic + many -1, 1, 2, 3 (with minus) ✔OK 1. 0.172 s OK
Bottom line: Especially the tests with small sized arrays are two orders of magnitude faster with just plain java. For large arrays its 'only' a factor of 3.
EDIT:
Accoring to the comments I just tried to get deeper into the problem and tried:
public int solution(int[] A) {
boolean[] B = new boolean[1000001];
for (int a : A) {
    if (a>0){
        B[a] = true;
    }
}
 return IntStream
        .iterate(1, a -> a + 1)
        .filter(i -> !B[i])
        .findFirst()
        .getAsInt();
}
Result:
▶ example1 first example test ✔OK 1. 0.096 s OK
▶ example2 second example test ✔OK 1. 0.096 s OK
▶ example3 third example test ✔OK 1. 0.096 s OK collapse allCorrectness tests
▶ extreme_single a single element ✔OK 1. 0.096 s OK 2. 0.096 s OK 3. 0.096 s OK 4. 0.096 s OK
▶ simple simple test ✔OK 1. 0.100 s OK 2. 0.096 s OK 3. 0.100 s OK
▶ extreme_min_max_value minimal and maximal values ✔OK 1. 0.096 s OK 2. 0.100 s OK
▶ positive_only shuffled sequence of 0...100 and then 102...200 ✔OK 1. 0.096 s OK 2. 0.096 s OK
▶ negative_only shuffled sequence -100 ... -1 ✔OK 1. 0.096 s OK collapse allPerformance tests
▶ medium chaotic sequences length=10005 (with minus) ✘TIMEOUT ERROR running time: 0.116 sec., time limit: 0.112 sec. 1. 0.116 s TIMEOUT ERROR, running time: 0.116 sec., time limit: 0.112 sec. 2. 0.116 s TIMEOUT ERROR, running time: 0.116 sec., time limit: 0.100 sec. 3. 0.124 s OK
▶ large_1 chaotic + sequence 1, 2, ..., 40000 (without minus) ✔OK 1. 0.340 s OK
▶ large_2 shuffled sequence 1, 2, ..., 100000 (without minus) ✔OK 1. 0.408 s OK 2. 0.372 s OK
▶ large_3 chaotic + many -1, 1, 2, 3 (with minus) ✔OK 1. 0.272 s OK
Conclusion:
- For small sized test arrays it is almost equally slow like the first version, thus here the IntStream seems to be the bottle neck.
- For large test arrays the speed seems to be intemediate. Thus I'm not really sure what it should tell me.
Edit 2:
I actually just found an article describing the issue partly: https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html . Therein, the author claims the difference between streams and for loop running over arrays is due to the fact that streams are quite new. However the article is dated 4 years ago.
