Java foreach or not

There are a lot of opinions regarding the for each construct and some of them are plain wrong. I am usually not bothered by wrong opinions, but this is a simple issue which can be clarified. I was triggered by two articles on Medium which appears first when you search google for “java foreach performance” (Which is Faster For Loop or Foreach in Java and For-each loop vs “forEach” method in Java 11. Which one is faster?). This is a bit of a rant, but the main purpose is to put things in place.

Language Specification

Since we are talking about a language feature the first reference should be Java Language Specification, the part of interest can be found here: 14.14.2. The enhanced for statement.

I strongly encourage you to read the whole section, but here is a brief summary of it:

The type of the expression must be Iterable or an array type (§10.1), or a compile-time error occurs.
If we have an Iterable, the code emulates a for loop which uses hasNext() and next(), the type of element is the generic type or raw type if none can be infered
If we have an array than a classical for loop with index is produced

Here we could also quote Joshua Bloch in his book Effective Java:

The for-each loop, introduced in release 1.5, gets rid of the clutter and the opportunity for error by hiding the iterator or index variable completely. The resulting idiom applies equally to collections and arrays:
// The preferred idiom for iterating over collections and arrays
for (Element e : elements) {
    doSomething(e);
}
When you see the colon (:), read it as “in.” Thus, the loop above reads as “for each element e in elements.” Note that there is no performance penalty for using the for-each loop, even for arrays. In fact, it may offer a slight performance advantage over an ordinary for loop in some circumstances, as it computes the limit of the array index only once. While you can do this by hand (Item 45), programmers don’t always do so.

The note on the eventual performance gain is not actual. That is because just-in-time compiler, which, as usual, does a wonderful job, optimizes the for each loop code. End of story. We could stop here for what is worth. Still, some misconceptions popped up out of nowhere and we should debunk them. In this matter, I have to point out that it does not help much to name for each loop an enhanced for.

Some well-known facts

Microbenchmarking is hard

A microbenchmark is a benchmark designed to assess the performance of a small or even tiny piece of functionality. Good examples are math functions, sorting numbers, and, as it is our case, iterating over the elements of a collection or array.

The microbenchmarks are easy to implement, and hard to get it right. The main reason is that to have a good grasp of performance one has to eliminate all sources of bias, constant folding, dead code elimination, add warm up iterations and multiple tests, ensure environment and context diversity. Its a deep topic in itself. The de facto standard for Java micro benchmarking is JMH (Java Microbenchmark Harness).

It is meaningless and totally misleading to perform only one measurement. If you don’t create conditions to eliminate outside factors, at least do as many measurements as possible to control external factors. Even when multiple measurements are realized a single number like the mean is not enough, you have to provide also information about the deviations from that central behavior (standard deviations or whatever you feel is informative).

Foreach loop is a syntax sugar

Because foreach loop is a syntax sugar, it will have the same effect as when you would write the generated code. No more, no less. We don’t count here wrong or different scenarios, like calling iterator’s hasNext() method multiple times or iterating over array from end to beginning. Still, the following conclusion was presented in one of the linked articles:

The for loop method is faster when using ArrayList because the for-each is implemented by the iterator, and it needs to perform concurrent modification verification.

Plain wrong in multiple ways. First of all the internal iterator loop calls only hasNext() and next(). There is no other call, thus no modifications or updates. The test was also performed on an ArrayList, which is not a thread safe construct, thus no concurrency work required. Nobody should expect anything else from foreach loop other than what could you expect if you would have to put there the iteration or usual for loop. As a consequence, any difference in performance should be explained by external factors, and there are plenty to consider.

Don’t compare apples and oranges

In one of the cited articles the author compares a foreach loop and a indexed for loop over LinkedList. Perhaps this comparison could be useful if one wants to illustrate what happens when you don’t consider the specific characteristics of used data structures. Here, however, it is used as an argument in favor of foreach loop.

When using LinkedList, the for-each is much faster than the for-loop, because LinkedList is implemented by using a two-way linked list. Each addressing needs to start from the header node. If we need to traverse LinkedList, we need to avoid using the for-loop.

I’ll give you another example: binary search in a sorted array using foreach loops. Immediately goes into O(nlogn). Could that be used as an argument against foreach loops?! Sadly, it’s a good example of ignorance.

JMH Microbenchmark

I have put us a small JMH benchmark to illustrate the similar performance. The code is as simple as possible and tests the behavior of foreach and indexed for on arrays and ArrayList.

@BenchmarkMode( {Mode.Throughput})
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class ForEachOrNot {

    @State(Scope.Benchmark)
    public static class MyState {
        @Param( {"100", "1000", "10000", "100000", "1000000"})
        private int len;
        private int[] array;
        private ArrayList<Integer> arrayList;

        @Setup(Level.Invocation)
        public void setup() {
            array = IntArrays.newSeq(0, len);
            arrayList = new ArrayList<>();
            for (int i = 0; i < len; i++) {
                arrayList.add(i);
            }
        }
    }

    @Benchmark
    public void arrayFori(MyState s, Blackhole bh) {
        long sum = 0;
        for (int i = 0; i < s.len; i++) {
            sum += s.array[i];
        }
        bh.consume(sum);
    }

    @Benchmark
    public void arrayForeach(MyState s, Blackhole bh) {
        long sum = 0;
        for (int i : s.array) {
            sum += i;
        }
        bh.consume(sum);
    }

    @Benchmark
    public void arrayListFori(MyState s, Blackhole bh) {
        long sum = 0;
        Iterator<Integer> it = s.arrayList.iterator();
        while (it.hasNext()) {
            sum += it.next();
        }
        bh.consume(sum);
    }

    @Benchmark
    public void arrayListForeach(MyState s, Blackhole bh) {
        long sum = 0;
        for (int i : s.arrayList) {
            sum += i;
        }
        bh.consume(sum);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(ForEachOrNot.class.getSimpleName())
                .warmupTime(TimeValue.seconds(1))
                .warmupIterations(3)
                .measurementTime(TimeValue.seconds(1))
                .measurementIterations(10)
                .forks(1)
                .resultFormat(ResultFormatType.CSV)
                .result(Utils.resultPath(ForEachOrNot.class))
                .build();

        new Runner(opt).run();
    }
}

Below are the results for iterating over arrays, and the second part displays results iterating over ArrayList. Notice that the iteration over arrays uses normal indexed for, and iteration over ArrayList uses the Iterator construct. Those are correspondent code replacements at runtime, without considering the possible optimizations realized by just in time compiler.

Benchmark                        (len)   Mode  Cnt    Score   Error   Units
ForEachOrNot.arrayForeach          100  thrpt   10  866.155 ± 4.783  ops/ms CI (99.9%): [861.373, 870.938]
ForEachOrNot.arrayForeach         1000  thrpt   10  698.727 ± 4.411  ops/ms CI (99.9%): [694.316, 703.138]
ForEachOrNot.arrayForeach        10000  thrpt   10  245.627 ± 3.759  ops/ms CI (99.9%): [241.868, 249.386]
ForEachOrNot.arrayForeach       100000  thrpt   10   35.395 ± 0.750  ops/ms CI (99.9%): [34.644, 36.145]
ForEachOrNot.arrayForeach      1000000  thrpt   10    2.262 ± 0.171  ops/ms CI (99.9%): [2.091, 2.432]
ForEachOrNot.arrayFori             100  thrpt   10  835.308 ± 3.905  ops/ms CI (99.9%): [831.403, 839.214]
ForEachOrNot.arrayFori            1000  thrpt   10  679.780 ± 4.150  ops/ms CI (99.9%): [675.630, 683.930]
ForEachOrNot.arrayFori           10000  thrpt   10  248.016 ± 5.347  ops/ms CI (99.9%): [242.670, 253.363]
ForEachOrNot.arrayFori          100000  thrpt   10   34.897 ± 0.592  ops/ms CI (99.9%): [34.305, 35.489]
ForEachOrNot.arrayFori         1000000  thrpt   10    2.185 ± 0.242  ops/ms CI (99.9%): [1.943, 2.427]

Benchmark                        (len)   Mode  Cnt    Score    Error   Units
ForEachOrNot.arrayListForeach      100  thrpt   10  840.314 ± 14.525  ops/ms CI (99.9%): [832.429, 842.212]
ForEachOrNot.arrayListForeach     1000  thrpt   10  502.087 ±  7.025  ops/ms CI (99.9%): [494.619, 506.100]
ForEachOrNot.arrayListForeach    10000  thrpt   10   68.679 ±  4.786  ops/ms CI (99.9%): [70.711, 72.764]
ForEachOrNot.arrayListForeach   100000  thrpt   10    6.008 ±  1.150  ops/ms CI (99.9%): [6.588, 6.694]
ForEachOrNot.arrayListForeach  1000000  thrpt   10    0.517 ±  0.105  ops/ms CI (99.9%): [0.693, 0.792]
ForEachOrNot.arrayListFori         100  thrpt   10  841.349 ± 25.444  ops/ms CI (99.9%): [833.514, 840.201]
ForEachOrNot.arrayListFori        1000  thrpt   10  502.576 ±  2.866  ops/ms CI (99.9%): [496.411, 506.693]
ForEachOrNot.arrayListFori       10000  thrpt   10   71.935 ±  0.876  ops/ms CI (99.9%): [71.255, 72.984]
ForEachOrNot.arrayListFori      100000  thrpt   10    6.586 ±  0.081  ops/ms CI (99.9%): [6.349, 6.751]
ForEachOrNot.arrayListFori     1000000  thrpt   10    0.723 ±  0.081  ops/ms CI (99.9%): [0.743, 0.805]

Notice the score which displays throughputs (number of operations per millisecond). The numbers are mixed and similar if one considers the standard errors. As expected.

Personal opinion on for each in Java

The for each syntax sugar construct is a good thing, since it shortens some code and cuts possibilities of making mistakes. I think it was not a wise decision to name this construct enhanced for loop. While I agree it is an enhancement in terms of code shortening, it is not a faster way to iterate, and there is some opportunity to be considered faster by various people when this is not the case.

The behavior of foreach differs if one uses the construct over iterable collections versus arrays. As it was pointed out, when iterating over arrays it uses a normal indexed loop, since this is the natural construct. I don’t consider mixing those behaviors under the same syntax as a good thing. Any change to how one wants to iterate over an array reverts back the syntax to the old syntax. And the old syntax for arrays is very different. I don’t like this and I consider it an inconsistency while reading code. This is why I find annoying when IDEs struggles to suggest for each construct over arrays. Of course, it is a minor issue, and overall the advantages could prevail.

For each loop is a shorter syntactical sugar and can be used whenever you have opportunity, but don’t expect any performance improvements.