CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]


Howdy and Viewing Unmanaged Code in VS.NET

 

An obligatory bit about me (skip this paragraph to get to real information) ... I am an independent consultant from Atlanta Georgia. I live here with my girlfriend Laura and our two dogs Machiavelli (a Neapolitan Mastiff) and Margeaux (our soon to be robotic Labrador). Our previous home was in New Orleans which is still our favorite travel destination. I like to post about a wide variety of items (DDD, Agile Development, Performance, Mono / Cross Platform Development, AOP, ice cream) I do my best to keep them in categories but I will not promise that they are always well categorized.

One of the things I like to post about (performance) involves some ramp up on the part of the reader. I figured I would start off by walking people through some basic unmanaged debugging using Visual Studio .NET so that it is a bit easier to follow later examples, and it allows me to just link to it in the future :D. I personally also use windbg (even though it has an interface which is an acquired taste) but VS.NET is quickly becoming a fully featured debugging tool and is much more accessible for simple optimization questions.

When we are dealing with code with a need to be highly performant looking at IL is often not good enough as the JIT optimizer will optimize our code much further than we could ever hope to on our own without things like unsafe code which also makes our code unverifiable. If you use a tool such as reflector or ildasm you can quickly see that the code generated in release and debug mode are almost identical. This brings to mind the question of why release mode is so much faster; it is the JIT optimizations that are applied. We cannot see these JIT optimizations by looking at the code in a managed way so we will look at the native code generated from our code.

I want to make it clear that I am not suggesting people do this all of the time. I cannot stress enough to be very wary of Premature Optimization. Make your code work first; your aim should be clarity not speed as it is not worth the effort … profile your code once done and re-visit sections that need to be sped up. You will generally see that <10% of your code is using up 70+% of your execution time, go back and optimize that 10%. Your focus should also generally be on measured speed (not just looking at generated code). Also keep in mind that the choice of data structures within your application is far more important than lower level optimizations.

That said knowing what happens under the covers is also fun! Let’s get on to setting up Visual Studio and walk through a simple example. To start with we need some code to expriment with.

        static void Main(string[] args) {

            for (int i = 0; i < 10; i++) {

                Console.WriteLine("Hello World!");

            }

        }

 

Now we need to get unmanaged debugging setup in Visual Studio. In order to do this we go to the properties of our project then to the debug tab. There is a checkbox there to “Enable unmanaged code debugging”, select it.

It is important to note that this option only applies to the current configuration that we are running in. We should therefore set it for both of our configurations. There is some additional work that we need to in order to deal with release unmanaged debugging but we will get to that later.

Insert a break point on the beginning of the for loop and go to the debug menu and select start (or press f5 J)

You will hit a breakpoint as usual... your screen should look something like the display in Figure 2 <insert picture>. If you do not have a call stack window, bring it up by going to the debug menu -> windows -> call stack (or by hitting ctrl + d then c)

Once we have our call stack up we can right click on our function and select “go to disassembly” by doing this we can get the following code listing for our code.

        static void Main(string[] args) {

00000000  push        ebp 

00000001  mov         ebp,esp

00000003  push        edi 

00000004  push        esi 

00000005  push        ebx 

00000006  sub         esp,38h

00000009  xor         eax,eax

0000000b  mov         dword ptr [ebp-10h],eax

0000000e  xor         eax,eax

00000010  mov         dword ptr [ebp-1Ch],eax

00000013  mov         dword ptr [ebp-3Ch],ecx

00000016  cmp         dword ptr ds:[00912DC8h],0

0000001d  je          00000024

0000001f  call        792B228E

00000024  xor         esi,esi

00000026  xor         edi,edi

00000028  nop             

            for (int i = 0; i < 10; i++) {

00000029  xor         esi,esi

0000002b  nop             

0000002c  jmp         0000003D

0000002e  nop             

                Console.WriteLine("Hello World!");

0000002f  mov         ecx,dword ptr ds:[022B303Ch]

00000035  call        785D9074

0000003a  nop                 

            }

0000003b  nop             

            for (int i = 0; i < 10; i++) {

0000003c  inc         esi 

0000003d  cmp         esi,0Ah

00000040  setl        al  

00000043  movzx       eax,al

00000046  mov         edi,eax

00000048  test        edi,edi

0000004a  jne         0000002E

        }

0000004c  nop             

0000004d  lea         esp,[ebp-0Ch]

00000050  pop         ebx 

00000051  pop         esi 

00000052  pop         edi 

00000053  pop      

 

What we are viewing is the actual native code being generated by the JIT mapped up to our code that we wrote.  We can quite easily see how our for loop works at the native level, if you have never seen one before this probably looks quite odd but it is in fact rather normal. Let’s take a close look at what is going on here.

00000029  xor         esi,esi

0000002b  nop             

0000002c  jmp         0000003D

 

This bit of code initializes our counter in ESI, which is an index register that can be used for indexing arrays. You will notice here that an age old assembly “trick” is being used to set our counter to 0, instead of loading the value 0 into the register we XOR it against itself (anything XOR’ed to itself is 0). The next line Nop meaning (no operation) does just that. The code then tells it to immediately jump to 3D ..  Sometimes jumps like this make us go through our code in a non-top down fashion (just like gotos in higher level languages such as C, VB, or C#). If we go to the other half of the for loop we can continue analyzing our code.

0000003c  inc         esi 


The first instruction in the tail bit of code (3C) increments our counter which is held in the ESI register (You can view the velues in your registers using the "registers" window in the debug windows or by hitting ctrl+d then r). In our first iteration however we will skip this line, the jump to 3D as we do not want to increment our index on the first iteration.

0000003d  cmp         esi,0Ah

00000040  setl        al  

00000043  movzx       eax,al

00000046  mov         edi,eax

00000048  test        edi,edi

0000004a  jne         0000002E


3D (the place the previous code jumped to and the immediately following lines) represent the actual comparison back to our stop value 10 (0A) and if we have not reached that value the last line will jump to 4A (which is the inner body of our loop).

0000002f  mov         ecx,dword ptr ds:[022B303Ch]

00000035  call        785D9074

 

The first line here will load our constant string from memory into the ECX register (which is a common purpose register), you will notice that ECX is always used for passing the first parameter to a method (in the case of instance methods ECX always contains “this”), followed by EDX for the second parameter then a series of “push” statements to store others on the machine’s stack.

The next line actually calls the function. I will at a later point go through how to look this function up but right now we have no issues figuring it out as we can see from our source that it is Console.WriteLine. The code will then fall back through to our increment discussed earlier and back through our verification code (which will then jump back into our inner loop code).

Needless to say, we can see significant waste happening in our trivial example. A perfect example of such waste can be seen here

00000009  xor         eax,eax

0000000b  mov         dword ptr [ebp-10h],eax

0000000e  xor         eax,eax

00000010  mov         dword ptr [ebp-1Ch],eax

 

Where we set EAX to 0 twice in a row... This is due to the fact that we are currently running in debug mode and not allowing optimizations to occur. In other words this code has been JIT’ed without allowing the JIT to do some intelligent things to make our code go faster.

There are a few issues with viewing optimized code. The first issue is the fact that the debugger by default will disable JIT optimizations (believe me I myself have gone through many a long hour late at night not realizing I was looking at un-optimized code). The second issue has to deal with “Just My Code” and the optimized code is therefore skipped. I had originally found the method of getting around this in Vance Morrison's Post on the subject. (Thanks Vance I had been bothered by this for a while and ended up just looking at raw assembler without source using the other method listed).

To get around these issues we should follow the instructions given

1)      Go to Tools -> Options -> Debugging -> General 

2)      Make certain that box labeled  ‘Suppress JIT optimization on module load’ is UNCHECKED.

3)      Also make certain the box labled ‘Enable Just My Code’ is also UNCHECKED. 

Vance also suggests going to the advanced build screen to insure that your release dll is built with pdb only, but this defaults on my machine. We would then simply run our code the same way we ran it before.

Another way to view our code with JIT is to build in release mode, (add a Console.Read() at the end) . Start the executable without the debugger, then attach the visual studio debugger (select break all from the debug menu or hit ctrl+alt+break to break).  

Using either method we can now view our JIT optimized code. Our code should come out something like the following

        static void Main(string[] args) {

            for (int i = 0; i < 10; i++) {

00000000  push        esi 

00000001  xor         esi,esi

                Console.WriteLine("Hello World!");

00000003  cmp         dword ptr ds:[02271084h],0

0000000a  jne         00000016

0000000c  mov         ecx,1

00000011  call        786FC654

00000016  mov         ecx,dword ptr ds:[02271084h]

0000001c  mov         edx,dword ptr ds:[0227307Ch]

00000022  mov         eax,dword ptr [ecx]

00000024  call        dword ptr [eax+000000D8h]

            for (int i = 0; i < 10; i++) {

0000002a  add         esi,1

0000002d  cmp         esi,0Ah

00000030  jl          00000003

00000032  pop         esi 

            }

        }

00000033  ret  

 

Wow this is a lot less code than what we had the first time through. JIT optimizations have really done a great job on this code. This difference is why it is so important to look at the actual disassembly at times as opposed to the IL as the JIT will often times recognize patterns in the IL that it can optimize.

An astute reader will notice that there is in fact more code in the inner part of our loop. This can be scary at first, but in this case the optimizer has inlined the Console.WriteLine method for us, saving us quite a bit of code. I will get more into inlining in a later post but it is a huge optimization that occurs.

We have at this point made our way through to looking at non-optimized and optimized native code in the debugger.  I think this is a good place to start. Over the next few posts we will build off of this knowledge to look more in depth at some common optimizations that occur with the JIT, we will also get into some other tools and look how they can assist in our quest for better code. Hope to see you there.

 

 

Some Resources:

http://en.wikipedia.org/wiki/X86
http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/ IL tutorial
http://burks.brighton.ac.uk/burks/language/asm/asmtut/asm1.htm ASM tutorial


Published Jun 09 2006, 05:28 PM by Greg
Filed under:

Comments

Brendan Tompkins said:

Fascinating stuff and what a great way to start your tenure here at CodeBetter!  Glad to have you aboard.

BTW, I'm a sortof NO native (Dad's whole family lives there) ... I always thought of naming my blog "Lagniappe"  :)

-Brendan
# June 9, 2006 8:20 PM

Greg said:

Glad to be here Brendan. Nice one on the lagniappe, I had considerred it as well but there goes that idea :)

My gf is out of town this weekend = hopefully extra time to write ..

Greg
# June 9, 2006 8:55 PM

Sam Gentile said:

Excellent start - you're making me proud -)
# June 10, 2006 9:46 AM

Jason Haley said:

# June 10, 2006 1:07 PM

Greg Young [MVP] said:

I promise this is the last ground laying post before I move on to something of consequence but I wanted...
# June 10, 2006 6:52 PM

Greg Young [MVP] said:

This article discusses looping performance at the lowest level (native). It dispels a few myths that have been circulating the blogosphere and makes some general performance recommendations.
# June 11, 2006 11:42 PM

Greg Young [MVP] said:

Something as simple as a method call can often be complex in the managed world. This is the first part in a series looking at how method calls work; it covers a basic method call including the JIT process.
# July 20, 2006 1:58 AM

PrideRoCK said:

.net b/s 结构程序性能优化索引 如何最大限度提高.NET的性能 如何最大限度提高.NET的性能(续) 客户端调用服务器端方法——ASP.NET AJAX(Atlas)、Anthem.NET和Ajax.NET Professional实现之小小比较 net b/s 数据库Microsoft SqlServer 2005 优化索引 SQL Server性能调优入门(图文版) 怎么捅破那层纸? 怎么捅破那层纸?(续) Sql Server Performance Tuning(捅破窗户纸续2)

# July 25, 2007 4:15 AM

二毛五 said:

(译者:这篇文章作者是一位美国的MVP,这是他的系列文章

# August 8, 2007 6:34 AM

Xt Idt said:

(译者:这篇文章作者是一位美国的MVP,这是他的系列文章

# August 20, 2007 2:02 AM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add
Check out Devlicio.us!