There are a lot of opinions regarding the for each construct and some of them are plain wrong. I am usually not bothered by wrong opinions, but this is a simple issue which can be clarified. I was triggered by two articles on Medium which appears first when you search google for “java foreach performance” (Which is Faster For Loop or Foreach in Java and For-each loop vs “forEach” method in Java 11. Which one is faster?). This is a bit of a rant, but the main purpose is to put things in place.
Language Specification
Since we are talking about a language feature the first reference should be Java Language Specification, the part of interest can be found here: 14.14.2. The enhanced for
statement.
I strongly encourage you to read the whole section, but here is a brief summary of it:
- The type of the expression must be Iterable or an array type (§10.1), or a compile-time error occurs.
- If we have an Iterable, the code emulates a for loop which uses hasNext() and next(), the type of element is the generic type or raw type if none can be infered
- If we have an array than a classical for loop with index is produced
Here we could also quote Joshua Bloch in his book Effective Java:
The for-each loop, introduced in release 1.5, gets rid of the clutter and the opportunity for error by hiding the iterator or index variable completely. The resulting idiom applies equally to collections and arrays:
// The preferred idiom for iterating over collections and arrays for (Element e : elements) { doSomething(e); }When you see the colon (:), read it as “in.” Thus, the loop above reads as “for each element e in elements.” Note that there is no performance penalty for using the for-each loop, even for arrays. In fact, it may offer a slight performance advantage over an ordinary for loop in some circumstances, as it computes the limit of the array index only once. While you can do this by hand (Item 45), programmers don’t always do so.
The note on the eventual performance gain is not actual. That is because just-in-time compiler, which, as usual, does a wonderful job, optimizes the for each loop code. End of story. We could stop here for what is worth. Still, some misconceptions popped up out of nowhere and we should debunk them. In this matter, I have to point out that it does not help much to name for each loop an enhanced for.
Some well-known facts
Microbenchmarking is hard
A microbenchmark is a benchmark designed to assess the performance of a small or even tiny piece of functionality. Good examples are math functions, sorting numbers, and, as it is our case, iterating over the elements of a collection or array.
The microbenchmarks are easy to implement, and hard to get it right. The main reason is that to have a good grasp of performance one has to eliminate all sources of bias, constant folding, dead code elimination, add warm up iterations and multiple tests, ensure environment and context diversity. Its a deep topic in itself. The de facto standard for Java micro benchmarking is JMH (Java Microbenchmark Harness).
It is meaningless and totally misleading to perform only one measurement. If you don’t create conditions to eliminate outside factors, at least do as many measurements as possible to control external factors. Even when multiple measurements are realized a single number like the mean is not enough, you have to provide also information about the deviations from that central behavior (standard deviations or whatever you feel is informative).
Foreach loop is a syntax sugar
Because foreach loop is a syntax sugar, it will have the same effect as when you would write the generated code. No more, no less. We don’t count here wrong or different scenarios, like calling iterator’s hasNext() method multiple times or iterating over array from end to beginning. Still, the following conclusion was presented in one of the linked articles:
The for loop method is faster when using ArrayList because the for-each is implemented by the iterator, and it needs to perform concurrent modification verification.
Plain wrong in multiple ways. First of all the internal iterator loop calls only hasNext() and next(). There is no other call, thus no modifications or updates. The test was also performed on an ArrayList, which is not a thread safe construct, thus no concurrency work required. Nobody should expect anything else from foreach loop other than what could you expect if you would have to put there the iteration or usual for loop. As a consequence, any difference in performance should be explained by external factors, and there are plenty to consider.
Don’t compare apples and oranges
In one of the cited articles the author compares a foreach loop and a indexed for loop over LinkedList. Perhaps this comparison could be useful if one wants to illustrate what happens when you don’t consider the specific characteristics of used data structures. Here, however, it is used as an argument in favor of foreach loop.
When using LinkedList, the for-each is much faster than the for-loop, because LinkedList is implemented by using a two-way linked list. Each addressing needs to start from the header node. If we need to traverse LinkedList, we need to avoid using the for-loop.
I’ll give you another example: binary search in a sorted array using foreach loops. Immediately goes into O(nlogn). Could that be used as an argument against foreach loops?! Sadly, it’s a good example of ignorance.
JMH Microbenchmark
I have put us a small JMH benchmark to illustrate the similar performance. The code is as simple as possible and tests the behavior of foreach and indexed for on arrays and ArrayList.
@BenchmarkMode( {Mode.Throughput}) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class ForEachOrNot { @State(Scope.Benchmark) public static class MyState { @Param( {"100", "1000", "10000", "100000", "1000000"}) private int len; private int[] array; private ArrayList<Integer> arrayList; @Setup(Level.Invocation) public void setup() { array = IntArrays.newSeq(0, len); arrayList = new ArrayList<>(); for (int i = 0; i < len; i++) { arrayList.add(i); } } } @Benchmark public void arrayFori(MyState s, Blackhole bh) { long sum = 0; for (int i = 0; i < s.len; i++) { sum += s.array[i]; } bh.consume(sum); } @Benchmark public void arrayForeach(MyState s, Blackhole bh) { long sum = 0; for (int i : s.array) { sum += i; } bh.consume(sum); } @Benchmark public void arrayListFori(MyState s, Blackhole bh) { long sum = 0; Iterator<Integer> it = s.arrayList.iterator(); while (it.hasNext()) { sum += it.next(); } bh.consume(sum); } @Benchmark public void arrayListForeach(MyState s, Blackhole bh) { long sum = 0; for (int i : s.arrayList) { sum += i; } bh.consume(sum); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(ForEachOrNot.class.getSimpleName()) .warmupTime(TimeValue.seconds(1)) .warmupIterations(3) .measurementTime(TimeValue.seconds(1)) .measurementIterations(10) .forks(1) .resultFormat(ResultFormatType.CSV) .result(Utils.resultPath(ForEachOrNot.class)) .build(); new Runner(opt).run(); } }
Below are the results for iterating over arrays, and the second part displays results iterating over ArrayList. Notice that the iteration over arrays uses normal indexed for, and iteration over ArrayList uses the Iterator construct. Those are correspondent code replacements at runtime, without considering the possible optimizations realized by just in time compiler.
Benchmark (len) Mode Cnt Score Error Units ForEachOrNot.arrayForeach 100 thrpt 10 866.155 ± 4.783 ops/ms CI (99.9%): [861.373, 870.938] ForEachOrNot.arrayForeach 1000 thrpt 10 698.727 ± 4.411 ops/ms CI (99.9%): [694.316, 703.138] ForEachOrNot.arrayForeach 10000 thrpt 10 245.627 ± 3.759 ops/ms CI (99.9%): [241.868, 249.386] ForEachOrNot.arrayForeach 100000 thrpt 10 35.395 ± 0.750 ops/ms CI (99.9%): [34.644, 36.145] ForEachOrNot.arrayForeach 1000000 thrpt 10 2.262 ± 0.171 ops/ms CI (99.9%): [2.091, 2.432] ForEachOrNot.arrayFori 100 thrpt 10 835.308 ± 3.905 ops/ms CI (99.9%): [831.403, 839.214] ForEachOrNot.arrayFori 1000 thrpt 10 679.780 ± 4.150 ops/ms CI (99.9%): [675.630, 683.930] ForEachOrNot.arrayFori 10000 thrpt 10 248.016 ± 5.347 ops/ms CI (99.9%): [242.670, 253.363] ForEachOrNot.arrayFori 100000 thrpt 10 34.897 ± 0.592 ops/ms CI (99.9%): [34.305, 35.489] ForEachOrNot.arrayFori 1000000 thrpt 10 2.185 ± 0.242 ops/ms CI (99.9%): [1.943, 2.427]
Benchmark (len) Mode Cnt Score Error Units ForEachOrNot.arrayListForeach 100 thrpt 10 840.314 ± 14.525 ops/ms CI (99.9%): [832.429, 842.212] ForEachOrNot.arrayListForeach 1000 thrpt 10 502.087 ± 7.025 ops/ms CI (99.9%): [494.619, 506.100] ForEachOrNot.arrayListForeach 10000 thrpt 10 68.679 ± 4.786 ops/ms CI (99.9%): [70.711, 72.764] ForEachOrNot.arrayListForeach 100000 thrpt 10 6.008 ± 1.150 ops/ms CI (99.9%): [6.588, 6.694] ForEachOrNot.arrayListForeach 1000000 thrpt 10 0.517 ± 0.105 ops/ms CI (99.9%): [0.693, 0.792] ForEachOrNot.arrayListFori 100 thrpt 10 841.349 ± 25.444 ops/ms CI (99.9%): [833.514, 840.201] ForEachOrNot.arrayListFori 1000 thrpt 10 502.576 ± 2.866 ops/ms CI (99.9%): [496.411, 506.693] ForEachOrNot.arrayListFori 10000 thrpt 10 71.935 ± 0.876 ops/ms CI (99.9%): [71.255, 72.984] ForEachOrNot.arrayListFori 100000 thrpt 10 6.586 ± 0.081 ops/ms CI (99.9%): [6.349, 6.751] ForEachOrNot.arrayListFori 1000000 thrpt 10 0.723 ± 0.081 ops/ms CI (99.9%): [0.743, 0.805]
Notice the score which displays throughputs (number of operations per millisecond). The numbers are mixed and similar if one considers the standard errors. As expected.
Personal opinion on for each in Java
The for each syntax sugar construct is a good thing, since it shortens some code and cuts possibilities of making mistakes. I think it was not a wise decision to name this construct enhanced for loop. While I agree it is an enhancement in terms of code shortening, it is not a faster way to iterate, and there is some opportunity to be considered faster by various people when this is not the case.
The behavior of foreach differs if one uses the construct over iterable collections versus arrays. As it was pointed out, when iterating over arrays it uses a normal indexed loop, since this is the natural construct. I don’t consider mixing those behaviors under the same syntax as a good thing. Any change to how one wants to iterate over an array reverts back the syntax to the old syntax. And the old syntax for arrays is very different. I don’t like this and I consider it an inconsistency while reading code. This is why I find annoying when IDEs struggles to suggest for each construct over arrays. Of course, it is a minor issue, and overall the advantages could prevail.
For each loop is a shorter syntactical sugar and can be used whenever you have opportunity, but don’t expect any performance improvements.
Leave a Reply