CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

December 2007 - Posts

  • Code Size is - not - the Enemy

     
    There has been recently some discussion about Code size is the enemy starting from (in order of publication)

    I would like to add my 2 cents. Code size is not the enemy, the enemy is everything that avoid you to add new features at a sustainable rhythm. The top 2 culprits are:

    • A bad overall code structure
    • Lack of automatic tests

    Both avoid you to evaluate the impact of any change in the code. Consequently one cannot know if this change here added a bug there, unless one is manually retesting its entire application, which is what one does. I am talking here about side-effects.

    A proper code structure avoids the propagation of side-effects. Controlling dependencies and thinking about how you are componentizing your code base lets you separate the concerns and consequently, it limits the propagation of side effect. The tool NDepend is all about this, making sure that your code structure is clean.

     

    A nice automatic tests battery tests if side effects resulting from a change break the correctness of the code.

     

    If I would have to choose a third culprit, it would definitely be copy-pasted/cloned code that Frans described well on its post.

     

     

     

    On measuring LOC

    I would like also to add my 2 cents on the number of Lines Of Code (LOC) measured. I wrote previously on this post about how you should count the number of LOC, with the logical LOC metric. Here are some measures quoted from the other posts:

    • Castle: 386,754
    • NHibernate: 245,749
    • Boo: 212,425 
    • Rhino Tools: 142,679
    • LLBLgen : > 300K

    The number of logical LOC is the only context free valid metric, meaning that it doesn't depend on language or coding style or amount of comment and doc. You can expect a 5 to 10 decreasing ratio between these numbers and the corresponding logical LOC. NDepend measures 53K logical LOC for the code base of NDepend and I consider it as a challenging and big project.

     

    I recently consulted for a 4M LOC project. I have heard thinks like,

    • Our code base is so big that it cannot take less than several hours to be compiled on powerfull servers,  
    • The .NET framework code base is a small project compare to our project (I consider that the .NET framework has around 1M logical LOC if you include all WPF, WCF… stuff. I got this value because it measures around 6M IL instructions)

     
    NDepend measured 700K logical LOC for the ‘4M LOC’ project. Saying that you are developing a giant code base is always a good thing for your ego and to get more credit, more budget and more reasons to be excused when something go wrong. This is unfortunate, as in every other engineering professions software needs some professional metrics. And even if LOC is not the right metric for quality, complexity or productivity, it is a good metric to estimate development cost and to compare projects size (as I explained in this post).

     

     

    The diseconomy of scale phenomenon

    The phenomenon of big things proprotionnally harder to maintain than small things is known as: Diseconomy of Scale. This is a phenomenon that explains why it can take a year to add a tiny feature on a large project such as Vista (70M LOC). The maintenance cost curve is simply not linear from the code base size, it tends to be polynomial or even exponential. Hopefully things are much better if you have some well structured and well tested code. I said hopefully because every piece of software needs new features to survive, and new features mean more code. I couldn't agree more with what Ayende wrote:

     

    Features means code, no way around it. If you state that code size is your problem, you also state that you cannot meet the features that the customer will eventually want. 

     

     

  • How to avoid regression bugs while adding new features

    I recently published an article on SearchWinDeveloment. It explains how we (the NDepend team) are using NDepend to analyze the NDepend code to avoid regression bugs while adding new features.

     

    More specifically, we are using the possibility to compare 2 versions of our code base. Then we focus our efforts on reviewing and testing the code that has been modified or that was added since the last release.  My experience always showed that the bulk of problems come from fresh code.

     

    Of course, the preferred way to avoid regression bug is to have a high automatic test coverage ratio on the new code but it is rarely possible to have 100% coverage, especially on application with a complex UI.

  • Hints on how to componentize existing code


    Representing the structure of a code base with a DSM (Dependencies Structure Matrix) is a great mean to perform all kind of useful tasks like determining layering of the code base or pinpointing component dependency cycles. NDepend supports a DSM view with many options such as

    • facility to highlight cycles,
    • generation of 'Boxes and Arrows' diagram,
    • customizing the dependency weight (how many methods/types/namespaces of an assemblies are using how many methods/types/namespaces of another assembly)
    • indirect\transitive dependency mode (for example you can see that A is indirectly using C if A is using B and B is using C)

    We provide here a 4mn screencast that explains step by step all these facilities.

    There is a use of DSM that was actually not expected during the DSM development: it can give some hints on how to componentize existing code. For example below is the DSM for the 97 assemblies of DotNetNuke 3.1 code base (DotNetNuke is an OSS framework to create Web App). This squared matrix has the interesting property of having a square in the upper-left corner. Having such a square in a squared DSM means that elements involved in the square are highly cohesive.
     

     

     

    Below is a zoomed snapshot of the square. We can now see that the 19 assemblies involved in the square (index 19 to 37 on the picture) are named following the pattern DotNetNuke.Modules.Store.*. We now understand that the developers of the DotNetNuke project decided to spread the code of the component DotNetNuke.Modules.Store on 19 assemblies. Personally, I prefer having few big assemblies instead of numerous small ones and I debated my point of view in this article (at the beginning, section .NET Components). Maybe, in this particular example it makes sense to split the DotNetNuke.Modules.Store code into several assemblies.

     

     

    Interestingly enough, the algorithm we use to order the code elements into the matrix headers has been able to pinpoint an intention buried into the code structure. This algorithm can give several results for the same set of code elements and you can browse all these results with the Triangularize button. Each time you press this button, a new order will be computed that will try to form squares. Below is a snapshot of another DSM still for the assemblies of DotNetNuke. The square we see now is still made of the DotNetNuke.Modules.Store.* assemblies.

     

     

    I had the chance to test this algorithm while consulting. My mission was to give hints on how to componentize a giant code base made of more than a million lines of C# code. The client nicely lets me talk about this experience. The audited project is made of 549 assemblies. As a consequence, it takes several hours to be compiled on big 16GB multi-core servers. I think that there is a lot of room for improvement as I explain in this post on how to benefit from C# compiler awesome performance. The first thing to do is to merge the code into less assemblies. And here, the DSM is absolutely necessary because the code base is so big and there are so many assemblies that it is humanly impossible to partition the assemblies. Below is the DSM  made of the 549 assemblies. It clearly pinpoints 6 squares that will certainly be turned into 6 assemblies. Applying the algorithm several times with the Triangularize button shows several smaller squares.

     

     

    Another common problem is how to componentize a namespace that contains hundreds of classes? When you are doing a framework, namespaces are a good way to partition the public surface you want to present to your clients. When you are doing an application, namespaces are a good way to partition your classes into a hierarchy of components. The .NET Framework comes with a super namespace: System.Windows.Forms. It is made of 1.509 types (1.093 are public). 688 types are nested types (i.e types declared inside another type). This reduces significantly the number of types we might want to componentize to 1.509 - 688 = 821 types. Below is the DSM of these 821 types and we can see several small squares and one giant square in the middle.

     

     

    If we zoom we can see that small squares have red borders. The NDepend's DSM shows red borders to highly a dependency cycle. If you position the mouse on the upper-left corner of a cycle, the info view lists elements involved into the cycle. Having a cycle between a set of classes is an indication of strong cohesiveness. This information represents a hint to create a component. In the picture below we see that the 16 types involved into the cycle are those related to the DataGrid control. Similarly, the second picture below lists 7 types related to the WebBrowser control.

     

     

     

    And the picture below shows that the giant square is actually made of 298 types relative to View controls, such as DataGridView, ListView or TreeView.

     


  • The Visual Studio Look and Feel

    In a previous post this summer I mentioned what has been one of the best advice I got this year (from Bob Powell and Fabrice Marguerie):

    When you target .NET developers, the closer to VisualStudio’UI, the better your UI is.

    Here is another great advice I got this year from Scott Hanselman when he was playing with VisualNDepend:

    I want to live in there (i.e in VisualNDepend), I don't want any additional UI. 

    Following these advices, we are now happy to release NDepend v2.6. It comes with a project management and an analysis life cycle similar to the Visual Studio one. Concretely, we added 3 panels to VisualNDepend:

    • Start Page
    • Error List
    • Project Properties

    On a side note this new version also comes with a Visual Studio studio addin compatible with VS2008 and support .NET3 \ .NET 3.5 assemblies analysis.

     

    Start Page

    The Start Page represents a powerful and widely accepted way to show at a glance common beginning tasks, including,

    • Create\Open project,
    • Analyze a set of .NET assemblies,
    • Compare 2 sets of assemblies.
    For beginners' users, the Start Page comes with some links to the Getting Started documentation. It is also a convenient place from where to install\uninstall Visual Studio and Reflector addins. Finally, the Start Page can inform user if she's running or not the latest version available.

     

     

    My friend Sebastien Andreo often says:

    Make the simple things simple and the hard things possible.

    Following this tenet, there is a right-click menu on Recent Projects to choose between loading the results of the most recent analysis, the result of an analysis previously done, or just the project properties. By default, left-clicking a recent project opens its most recent analysis or the project itself if the most recent analysis is not available.


     

    Of course, the Start Page panel remains active and available at any time, whether the user is sifting the result of an analysis, run an analysis, edit some project properties... 

     

    Error List

    Beside all the Code Query Language possibilities to analyze a code base, NDepend is able to provide numerous useful information about the health of your build process, including:

    • Assemblies versionning issues such as:
      • AssemblyA references AssemblyB v2.1 but only AssemblyB v2.0 is available.
      • AssemblyA references 2 versions of AssemblyB (which is not necessarily a bad thing, but it's still useful to be aware of such situation).
    • Assembly conflicts such as:
      • The name of me assembly main module file is different from the logical name of my assembly.
      • Several different assemblies with the same name can be found (different = different hash code and/or different version).
    • PDB files issues such as:
      • Missing PDB files.
      • PDB files and code source files not in-sync.
      • PDB files and assemblies are not in-sync.

    So far, this precious information emitted during analysis, was kind of buried in a section of the report. It is now available in a Error List panel. The different warnings are visible at any time, both during analysis and also when exploring results of an analysis. There is also the convenient facility to enable\disable warnings.

     

     

    Project Properties

    Finally, we discarded the NDepend.Project.exe UI. The new Project properties panel makes easier the choice of .NET assemblies to analyze. Also, the list of Tiers Assemblies (i.e the assemblies that we don't develop but that are used by the assemblies of our application) is now automatically inferred from the list of the application assemblies.

     

     

     

    Personally, I like to use this Code to Analyze panel to have a thorough view of the set of my assemblies, with information such as:

    • Which assemblies is strong named or not (sn column)?
    • Which assemblies come with a pdb file (pdb column)?
    • Size, Platform and target Runtime of my assemblies.
    • How many assemblies are referenced by a given assembly (+ the list of referenced assemblies in a tool tip).
    • What is the list of Tiers assemblies used by my application?

    Of course, you already know about this information thanks to Windows Explorer, Reflector, Ildasm, Visual Studio... The cool thing here is that the Code to Analyze panel centralizes all this.

     

     

    Still following the Make the simple things simple and the hard things possible tenet, the editor of folders that contain your assemblies is now hidden by default. It becomes visible thanks to a extender button. The list of folders is populated automatically when you select your application assemblies. This folders tab is useful if you want to tweak the existing set of folders or if you want to switch to Relative path mode. In Relative path mode, folders are relative to the NDepend project file location. It is useful to use the same project on different machines. Also, for a given folder, the Folders tab shows at a glance which assembly is an Application one (green) a Tier one (blue) not used (white) and also which .exe and .dll files are not assemblies.

     

    In the tab of Project Properties panel and a bit everywhere in the VisualNDepend UI, we are now using an Office 2007 invention that I don't know the name of. I would call this Info tooltip. It consists in providing a tiny information icon that displays a tooltip. I found it more convenient than classic tool tip on control because:

    • With classic tooltip, when discovering a form the user doesn't know which control has a tooltip or not.
    • The tooltip information is displayed immediately when the mouse hovers the Info tooltip, there is no classic half-second wait.
    We use Info tooltip as a consolidation for the classical documentation. We all know that users barely read manual and my opinion is that Info tooltip is a good way to access documentation just-on-time. For example, here, we estimate that if the user is using its own Xsl sheet to build report, she might want to know where she can find information about how she can build a custom Xsl sheet.

     

     


  • Deconstructing Software

    In my previous post Keep your code structure clean I explained how to build Code Query Language (CQL) constraints that can help avoiding design erosion by preventing mistakes. This is the preventive approach, but what if you have to improve an already existing design? I believe that any code base contains many design mistake that would make architects blush. The reason is that there is a lack of tool to help human deals with tangle code. This is why I take a chance here to explain how to deconstruct the design of your code base with NDepend.

     

     

    Getting a particular transitive closure

    Let's show which code of the NUnit framework can potentially use the class System.AppDomain. We need to know about the methods that are using AppDomain objects. But also we need to list the methods that are using these methods, and then the methods that are using these methods etc... Basically we want the transitive closure of AppDomain users. Transitive closure is a powerful mean to deconstruct software because it shows how the code is really layered (not how it should be layered). Getting a transitive closure from the source code is practically impossible and there is a need for tooling. As far as I know NDepend is the only tool that deals with transitive closure (even if considering tools in the Java sphere).

     

    The following screenshot shows how to get the transitive closure of NUnit namespaces that are using the class System.AppDomain.

     

    Actually we could have chosen the transitive closure made of methods or types or assemblies. We choosed namespaces for clarity since there are 198 methods involved in the transitive closure made of methods. 

    A CQL query had been generated for us and here is the result. There is a metric DepthOfIsUsing "System.AppDomain". The namespace with the value 1 are the ones that are using directly the class AppDomain. Those with the value 2 are the ones that are using the namespaces with a value 1 etc... The namespace System has a value 0 because it contains the class AppDomain.

     

    A good news is that thanks to NDepend addins, you can get such a transitive closure directly from VisualStudio or even Reflector:

     

     

     

    Visualizing a particular transitive closure

    NDepend comes with many original facilities to help users have an intuitive understanding of the code design. The best way to visualize a transitive closure is the good old Boxes and Arrows diagram.

     

    To get such a Boxes and Arrows diagram with NDepend, you need to play with the Dependencies Structure Matrix. As shown below, the CQL Query Result panel comes with 4 buttons to copy code elements involved in a CQL query to the matrix headers:

     

     

     

    Here the resulting matrix is: 

     

     

    And finally, to get the Boxes and Arrow diagram from the matrix you just need to click this button:

     

    As we just saw, to get Boxes and Arrows diagram from some CQL results there is this extra Matrix step. We designed NDepend this way because the Matrix comes with many facilities to deal with transitive closures.

     

     

     

    Browsing transitive closure

    Below, here is the internal design of the VisualNDepend assembly. A blue cell means that the corresponding namespace in the horizontal header is directly using the corresponding namespace in the vertical header. This diagram immediately tells us which namespace is low-level and which one is high-level. It also tells that the design is well layered and that there is no dependency cycles (more on dependency cycles below).

     

    With the matrix, we can see transitive closures by switching to the indirect dependencies mode, as shown below. Now a blue cell with a weight of X tells that the namespace user is using the namespace used with a depth of X. Rows that contain a lot of blue cells indicate low level namespaces that are used directly or indirectly by almost all other namespaces.

     

    For example the namespace TreeCodePanel is using the namespace Helpers with a depth of 6. 6 represents here the length of a shortest path from TreeCodePanel to Helpers. The picture below shows how you can visualize one of the shortest paths...

     

     ...and here is the shortest path:

     

     

    Transitive closures and dependencies cycles

    When the design contains some dependencies cycles, the matrix contains some black cells. For example, the matrix below tells that almost all namespaces of mscorlib are involved in a dependency cycle with almost all other namespaces of mscorlib.

     

    Here also, to give more sense to data, you might prefer visualizing the cycle with a boxes and arrows diagram. You can generate such a diagram as shown in the following screenshot:

    ...and here is the resulting diagram. Notice that the value of a black cell indicates the shortest length of a cycle that contains both involved code element. Here the length of a shortest cycle that contains both System.Runtime.Remoting.Data and System.IO.isolatedStorage is 6.

     

    My personal opinion is that dependencies cycles are the worth kind of mistake a design can contain. When there are some cycles, you cannot anymore develop your software with the divide and conquer way. In other words, cycles break componentization. I wrote an article about this topic that explains my positions and also how you can use NDepend to remove cycles.

     

    Actually NDepend and especially the CQL language has some others features that helps dealing with dependencies and closure that I will explain in some other posts.

     

    I would like to notice that we worked hard to optimize the NDepend code that deals with transitive closures of large and complex code base. For example, to browse dependencies closures of all classes of System and all classes of System.Threading (282 x 70 classes with thousands of long dependencies cycles), NDepend won't take more than 2 seconds to compute the matrix.

     

     

     

     


     

     

     

     

More Posts