Ok, I have received about a dozen emails in regard to my last post An in depth look at for loops basically telling me that “I am wrong because Richter says so”. These emails have used two points of reference by Jeffrey Richter. The first is the CLR internals presentation available as a web cast in particular around minute 12 during the foreach discussion where some points are made about for and foreach. The second reference I have been getting a lot of is the pages around 305 in CLR via C# 2.
To Jeffrey Richter if you happen to read this, let me start by saying I have the highest respect for you, I have thoroughly enjoyed nearly everything you have ever written; it is not only factual but a pleasure to read. I would personally not know most of the information I do today if it were not for books like yours. The most beneficial parts of your content are not these hidden gems but the teaching of the process to obtain them yourself. You taught me how to fish :)
That said let me address some of these. I will start with CLR via C#.
"The second thing to notice about the code above is that the JIT compiler knows that the for loop is accessing array elements 0 through Length – 1. So the JIT compiler produces code that, at runtime, tests that all array accesses will be within the array’s valid range. Specifically the JIT compiler produces code to check if ((Length – 1) <= a.GetUpperBound(0)). This check occurs just before the loop. If the check is good, the JIT compiler will not generate code inside the loop to verify that each access is within the valid range. This allows array access within loops to be very fast." Page 306 paragraph 2 CLR via C#
I fear that Jeffrey Richter has been caught by the same confusion as Brad Abrams. The code in question is
|
static int RichtersCode() {
Int32[] a = new Int32[5];
int total = 0;
for(Int32 index=0;index<a.Length;index++) {
total += a[index];
}
return total;
} |
|
static int RichtersCode() {
Int32[] a = new Int32[5];
00000000 push esi
00000001 mov edx,5
00000006 mov ecx,7915982Ah
0000000b call FFB21F70
00000010 mov ecx,eax
int total = 0;
00000012 xor esi,esi
for(Int32 index=0;index<a.Length;index++) {
00000014 xor edx,edx
00000016 mov eax,dword ptr [ecx+4]
00000019 test eax,eax
0000001b jle 00000028
total += a[index];
0000001d add esi,dword ptr [ecx+edx*4+8]
for(Int32 index=0;index<a.Length;index++) {
00000021 add edx,1
00000024 cmp eax,edx
00000026 jg 0000001D
}
return total;
00000028 mov eax,esi
0000002a pop esi
0000002b ret |
|
Listing 1: Jeffrey Richter’s code |
By looking at this we should quickly see that it is in fact not calling GetUpperBound(0) prior to the loop. It has inlined the Length property and it never does any comparison. The JIT is realizing that since the array was created locally (on the stack??? and it knows the length that it was created with) that there is no need to make any comparisons. This cannot possibly throw an exception as written in native code. This optimization is in fact much better than the bounds hoisting we were hoping for, I believe this is being done for Constrained Execution Regions but I had already planned on discussing this theory in more depth for another post, you’ll just have to wait J
We can see that this is the same “confusing” optimization we looked at in the article because if we create the array outside of the function the optimization will not occur as can be seen here.
|
static Int32[] b = new Int32[5];
static int RichtersCode() {
int total = 0;
for(Int32 index=0;index<b.Length;index++) {
total += b[index];
}
return total;
} |
|
static int RichtersCode() {
int total = 0;
00000000 push edi
00000001 push esi
00000002 xor ecx,ecx
for(Int32 index=0;index<b.Length;index++) {
00000004 xor edx,edx
00000006 mov esi,dword ptr ds:[022B1EC8h]
0000000c mov eax,dword ptr [esi+4]
0000000f test eax,eax
00000011 jle 00000025
00000013 mov edi,dword ptr [esi+4]
total += b[index];
00000016 cmp edx,edi
00000018 jae 0000002A
0000001a add ecx,dword ptr [esi+edx*4+8]
for(Int32 index=0;index<b.Length;index++) {
0000001e add edx,1
00000021 cmp eax,edx
00000023 jg 00000016
}
return total;
00000025 mov eax,ecx
00000027 pop esi
00000028 pop edi
00000029 ret
0000002a call 792B44A9
0000002f int 3 |
|
Listing 2: Jeffrey Richter’s code modified to use external array |
The specific lines that show the bounds are still there are right before the assignment (in the blue area)
00000016 cmp edx,edi
00000018 jae 0000002A
This is comparing the current counter against edx (which is where we have our length). It will then jump to throw the exception.
I feel I have done a fairly good job illustrating that there is in fact no array bounds hoisting occurring; we are looking at the second optimization discussed in the article. My guess is that we have errata in the book, a late change for CER, or a JIT bug in that it does not hoist at other times (I actually think it may be a combination of the three J).
The second bit that I have been receiving emails about deals with foreach vs for and comments that were made in the CLR internals presentation (which let me add is an amazing presentation and I highly recommend it). There are actually two bits in the presentation that have been brought up, the first is that foreach on a string (or other array operation) is much slower than using a for loop. The second is that for and an index is a better usage overall than a foreach.
For the first case we have to remember that foreach does in fact have special handling in both VB.NET and C# for dealing with arrays. They will in fact generate identical code to a normal for (but strings are a special special case). Let’s take a look at the code in question.
|
static void TestForeach() {
string test = "thisisatestingstring";
foreach (char c in test) {
Console.WriteLine(c);
}
}
static void TestForLoop() {
string test = "thisisatestingstring";
for(int i=0;i<test.Length;i++) {
Console.WriteLine(test );
}
}
|
|
Listing 3: Two methods to iterate through strings |
When we disassemble this to IL we receive the following
They are in fact nearly identical. There is no performance gain to be reached here by using for. It is very important to remember that foreach has dual definitions. Admittedly I always say that looking at IL is bad so here is the disassembled native code.
|
static void TestForeach() {
string test = "thisisatestingstring";
00000000 push edi
00000001 push esi
00000002 mov esi,dword ptr ds:[02314AF8h]
foreach (char c in test) {
00000008 xor edx,edx
0000000a mov edi,dword ptr [esi+8]
0000000d test edi,edi
0000000f jle 00000024
00000011 movzx ecx,word ptr [esi+edx*2+0Ch]
dummy2 = c;
00000016 mov word ptr ds:[00912FE8h],cx
0000001d add edx,1
foreach (char c in test) {
00000020 cmp edi,edx
00000022 jg 00000011
00000024 pop esi
}
}
00000025 pop edi
00000026 ret
static void TestForLoop() {
string test = "thisisatestingstring";
00000000 push edi
00000001 push esi
00000002 mov esi,dword ptr ds:[02314AF8h]
for(int i=0;i<test.Length;i++) {
00000008 xor ecx,ecx
0000000a mov edi,dword ptr [esi+8]
0000000d test edi,edi
0000000f jle 00000024
dummy2 = test ;
00000011 movzx eax,word ptr [esi+ecx*2+0Ch]
00000016 mov word ptr ds:[00912FE8h],ax
for(int i=0;i<test.Length;i++) {
0000001d add ecx,1
00000020 cmp edi,ecx
00000022 jg 00000011
00000024 pop esi
}
}
00000025 pop edi
00000026 ret |
|
Listing 6: Disassembled code |
The second bit from the presentation that people have emailed me about is some comments made about for being better than foreach in general. I believe many people are taking these comments out of context as Jeffrey does in fact qualify these comments in the context of collections, where they do hold true as the enumerator simply does an indexed lookup on the collection. I gave what I thought was a good example of where it would not be faster, a linked list, in my original post that I will go into a bit more detail with.
Let’s consider for a minute how a linked list works. I store in my class a pointer (reference) to the first item in my list, if you want the 5th item I will iterate through my list until I hit the 5th node and return it to you, in this process I touch five nodes of the list. The same holds true for nine, thirty four, or seven thousand; we can therefore say that an indexed lookup into our linked list is O(n) (Big O n), we can actually make a stronger statements but this will suffice. If we were to create an enumerator for our linked list, its MoveNext() function would simply return CurrentNode.Next this operation occurs in O(1). We can therefore say that our n O(1) operations will be much faster than n O(n) operations .. we could in fact even summarize this by saying that the enumerator will produce n touches of nodes where as the for with indexes will produce ∑n touches of nodes. This is not that big of a deal on small lists but on a big list (say 100 items) the difference would be 100 touches with the enumerator vs 4950 touches with the index.
This holds true with other objects as well although not usually to this scale, a tree for instance would be much heavier to hit with indexes as opposed to an enumeration. Most items such as this will not offer an index based lookup so it’s not that big of a deal ..
Hope this clears up any confusion.