CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

October 2008 - Posts

  • Comparing Hash Table Implementations Performances

     
    One of my preferred programming paradigm is hash table. I consider almost like magic being able to test if a collection contains a particular object, independently from the size of the collection (hence the term constant time often used to characterize hash tables). Personally, I use hash tables intensively and it is the cornerstone of all VisualNDepend and Code Query Language (CQL) implementation performances. Performances are here critical since the goal of NDepend is to let .NET developers knows anything about their code bases in real-time, even on millions lines of code application that spawns on multiple Visual Studio solutions (anything about their code bases include architecture/dependencies/layering facts, 82 code metric values, changes between 2 versions, state mutability facts, encapsulation facts...).

     

    I won’t go through all the details of hashtable here (load factor/collision/bucket/hash computation/hash on immutable state/caveat etc…). Jeff Atwood did it very well in his post Hashtables, Pigeonholes, and Birthday . Frankly, if you are not sure about hash table concepts or if you don’t use them on a daily basis, go read this post and understand why you are about to use hash tables extensively from now. Also, if you don’t believe me that hash tables are so central for good programming, then believe Microsoft engineers that provided the method Objet.GetHashCode() meaning that the hash concept applies for all .NET objects of all time.

     

    I was curious about comparing the performances of the different implementations of hash table .NET developers. My motivation was to see if it was worth switching from the .NET framework hash table implementations to some other ones.

     

    Basically there are the 2 classes in the .NET framework System.Collections.Generic.Dictionary<K,V> , System.Collections.Generic.HashSet<K> (I don't count the non-generic System.Collection.HashTable). There are also the implementations available in the 3 libraries C5 (by IT University of Copenhagen), PowerCollections  (by Wintellect) and Mono (by Novell). With the exception that PowerCollections doesn’t seem to provide its own Hash Dictionary.

     

    Find below the detailed results. The conclusion is that the .NET Framework implementations are a bit more cheaper than the Mono ones, while the C5 and PowerCollections implementations are behind. Basically .NET implementations are between 1.4 to 1.7 more performant on insertion than the Mono ones. Both are equivalent when it comes to test containment in HashSet<T> and the Mono implementation of Dictionary<K,V>.TryGetValue(...) is a tiny bit better than the .NET one. One has to consider that the big motivation for using hash tables is when there are a lot of containment tests/retrieval, since these operations are done in a constant time. Thus, the Mono flaw on insertion is not that serious and this is a good news for all of us that don't have access to the class System.Collections.Generic.HashSet<K> because we are still running on .NET v2.0. We can just use the Mono HashSet<T> implementation.

     

    Concerning memory , if you look at results you'll see that Mono collection seems up to  2 times cheaper that .NET ones but this results are biased. Indeed, some arrays objects are maintained and I cannot distinguish who reference what. I tried to do the memory tests separately but internally some arrays are maintained by the CLR ans it is still not possible to infer a clear result.

     

    Here is the C# source file used to do the tests. Notice that I prefixed Mono collections with Mono. to avoid collision with .NET Framework collections. All hash table tested have some string generic parameters, and we also did the test with int32 values, to get also a glimpse at how it behaves with value type. If you have feedback on these testing methodology please put them in comment. I know that the biggest concerns is the fact that a single insertion in an hash table can sometime takes a lot of time because internally it sometime triggers a re-arrangement of keys proportional to the number of elements. The only way to shorten these unwanted effects was to do tests for several values.

     

    The results for both performance and memory profiling were obtained with the great JetBrain dotTrace profiler. Concerning memory profiling tool I have to say that usually I use the Scitech .NET Memory Profiler which has some awesome real-time profiling capabilities without almost any overhead. I choosed dotTrace here because the result presentation is nicer and consistent with the performance result I wanted to present.

    Concerning performance profiling Red-Gate recently released .NET Performance Profiler v4.1 that has some great visualization features but honestly, I haven’t found the time yet to dig in it and see if I could prefer it over my good friend dotTrace.


     

     

     

    Hash Table: Performance Profiling Results

     

    Performance for 1M strings.

     

     

    Performance for 100K strings.

     

     

    Performance for 10K strings.

     

     

    Performance for 1000 int32.


     

    Performance for 10K int32.

     

     

     

    Hash Table: Memory Profiling

     

    Memory for 100 elements

     

     

    Memory for 1000 elements

     

     

    Memory for 10M elements

     

     

    Memory for 100M elements

     

     

    Memory for 1000M elements

     

     

     


     

  • Controlling the usage of libraries

     
    Recently, Scott Hanselman has proposed a survey to know which part of the .NET Framework real-world developers are using. The result will be interesting mostly for Microsoft engineers in charge of the .NET Framework. However, it might be interesting for a team to know which tier code its code base really uses. You certainly know coarsely if your code base is using or not, libraries such as ADO.NET, WPF, Windows Form, ASP.NET but also NHibernate, NUnit or Log4Net. But can you answer these 2 questions?

    ·         Which part of the code base uses which library?

    ·         Which types and members of the library are really used by your code base?

     

    Before rambling on why you should care about these questions and what are the perspectives offered by an accurate control of the usage of tier code, let's illustrate how NDepend can help answering such questions and more in seconds.

     

    Let’s analyze the code of the OSS (yet very professional) application Paint.NET v3.0. Assemblies of the application are in black while library/tier/framework assemblies are in blue. The Dependency Matrix shows which assemblies of the application are using which tier assemblies. For example, we can see that the library SharpZipLib is only used by the assembly Paint.NET.

     

     

     

    Things get really interesting when we unfold tier assemblies. Indeed, during analyzes, NDepend only focus on code elements of tier assemblies that are really used by the application. Here the Dependency Matrix unfold possibility is a powerful tool to dig in who is using what. For example, the following screenshot shows that 8 members of the class List<T> are used by 7 methods of the assembly Paint.DotNet.Data.

     

     

     

    One can easily browse this coupling with the Dependency Matrix…

     

     

     

    …but also, with the Dependency Graph which is clearly more appropriate here:

     

     

     

     

    Diff in the usage of tier code

     

    NDepend allows comparing 2 different builds of an application, to know which code has been refactored, added or removed. While a code base is evolving, the way it uses libraries is evolving also. Some libraries are discarded, some tier classes that wasn’t used are now used… While comparing 2 builds, NDepend also compare the tier code usage. Hence, while unfolding a library code in the class browser, we can easily pinpoint which tier code elements that weren’t used are now used (in bold), which tier code elements are not used anymore (striked) and which tier assemblies/namespaces/classes contains some usage change (underlined). The following screenshot illustrate the usage diff of collections between Paint.NET v2.72 and Paint.NET v3.0.

     

     

     

    We can see at a glance for example that between these 2 releases, developers of Paint.Net thought that using the interface IDictionary<TKey,TValue> was a good idea.

     

    If you prefer, you can harness the 3 dedicated Code Query language conditions IsUsedRecently / IsNotUsedAnymore / IsUsedDifferently to query the diff in the usage of tier code:

     

    SELECT TYPES FROM NAMESPACES "System.Collections.Generic" WHERE IsUsedRecently / IsNotUsedAnymore / IsUsedDifferently 

     

    Constraining the usage of tier code

    While this is cool to control the usage of tier code with such accuracy, I would like to explain how this possibility is useful for developers.

     

    One of the direct application is to forbid some incorrect usage of a library. For example, as explained here, while running with .NET v2.0 runtime, the ultra-popular method String.IsNullOrEmpty(String) can raise a NullReferenceException. I don’t really know if this bug has been fixed but in fact I don’t care, because we wrote the following CQL rule to make sure that we don’t use this method (we use our own implementation actually):

     

    // <Name>String.IsNullOrEmpty() is bugged and must not be used</Name>

    WARN IF Count > 0 IN SELECT METHODS

      WHERE IsDirectlyUsing "OPTIONAL:System.String.IsNullOrEmpty(String)"

     

    Notice the OPTIONAL: prefix. Without it, this query wouldn’t compile. Indeed, as the method IsNullOrEmpty() is not used by our code base, the analysis doesn’t reference it. Without the OPTIONAL: prefix the CQL compiler would emit a ‘Cannot find method named …’ error.

     

     

    Let's dig in another real-world example. We are relying on the DevExpress DXperience Windows Form library to enhance the usability of our product. Amongst plenty of cool stuff, this library lets control the skinning of controls.

     


     

    This is only possible if you use the DXperience buttons (and other controls) instead of the classical Windows Form button. To prevent a developer to use a classical Windows Form button, and then to guarantee a coherent skinning of our UI, we wrote the following rule:

     

    // <Name>All buttons should be of type DevExpress.XtraEditors.SimpleButton and not System.Windows.Forms.Button</Name>

    WARN IF Count > 0 IN SELECT FIELDS FROM ASSEMBLIES WHERE

      IsOfType "OPTIONAL:System.Windows.Forms.Button"

     

     

    The possibilities to control how your code uses libraries are endless. We provided by default a set of around 30 rules (some borrowed from FxCop, some others from our experience) about the usage of the .NET Framework. We expect to release more default rules in the future, also relative to popular framework usage such as NHibernate or Log4Net. What is really cool is that each time a developer detects a particular library usage issue, he can very easily define some custom rules about it. The rules can be named appropriately and contain further comment detailing the issue. This way its co-worker will be aware of this issue and won't redo the same error in the future. Hence such set CQL rules helps capitalizing on the experience acquired the hard-way on a library. Btw, don’t hesitate to report us the coolest rules you found on any popular library (.NET framework included).

     

     

    Separation of Concerns and usage of tier code

    I would like to mention that a few months ago, I explored in this blog post, Dependencies and Concerns, an interesting application of controlling the usage of tier code. Basically the idea is, tell me what you are using and I can tell you with what you are concerned about. Hence, the post explains how you can write some rules, to forbid for example using some ADO.NET stuff at the same time as using UI stuff. In other words, you get a mean to write rules to check that concerns of you application are properly separated (like, the UI code shouldn’t use directly the DB code).

     

    Another interesting field to explore is to make sure that special concerns such as threading, error handling, logging, users' authentication, security permissions... are strictly used within some bounded regions of your code.

    WARN IF Count > 0 IN SELECT METHODS OUT OF NAMESPACES
    "Product.UI.ThreadingHelper",
    "Product.DB.ThreadingHelper" WHERE
    IsDirectlyUsing
    "System.Threading"

     

     

  • Listen to your users

     

    While listening the users to improve the usability of your product seems to be a controversial issue, I am leaning toward the Listen to user camp. My opinion is biased by the fact that I have the exceptional chance to share the same job and same concerns with users of the product I am responsible for. As all of us in the NDepend team, our users are developers and architects with a solid interest in software quality and maintainability.

     

    I recently got a nice occurrence of the Listen to your users tenet. It is about the Structural Dependency Matrix, one of the most polished feature of NDepend. It has been released 2 years ago and since then, we are continuously improving it. We believe that the matrix is a unique tool to detect code structural flaws, to understand and explore the real architecture of a code base and to forecast the cost and feasibility of large refactoring. The matrix works hand-in-hand with the graph view, the CQL language and most other features of NDepend. It comes with dozens of options categorized with the Make the simple things simple and hard things possible idea in mind. It has been profiled to the point where every action execute in real time, even on hundreds of thousand’ cells matrix representing inter-dependencies of millions of lines of code. We dog-food it daily to improve the structure of the NDepend code itself. Despite all this work and expertise, a user came to us recently with a very useful option that we never thought about! The ice on the cake is that everything was here to let us implement and test it readily.

     

    The idea consists in letting the user remove empty rows/columns. It is especially useful when the dependency matrix is sparse. Here is a matrix representing the coupling between the 113 types of the namespace System.Drawing and the 205 methods of the class System.String. This matrix is sparse and not very handy to browse the who uses who in this coupling.

     

     

     

    Here is the same coupling where empty rows and columns have been removed. The sparse (113x205) matrix became a dense (27x29) matrix. It is now much easier to gather useful information about the System.Drawing to System.String.

     

     

     

     

    I believe that such a dense matrix represents the optimal way to browse complex coupling. The following coupling graph that shows the exact same information is partially unreadable:

     


     

     

    The questions are why we didn’t think of this feature before and why no user asked for it so far? The answer is that actually this feature already existed more or less, through the possibility of opening a dependency within the matrix by keeping only involved code elements. Here is an illustration of this possibility:

     

     

    It seems that everyone got used to this keep only involved code element feature that hided the need for removing empty rows or columns in a bit more complex scenario. Hopefully a fresher user view opened our eyes on the usefulness of this feature.

     

    Concerning the Listen to users tenets, its veracity certainly depends on your users qualification. In the past, I have been team leader for a software PC station that tracked expensive remote assets such as luxury cars and loaded trucks. Users of the program were security guards that knew more about guns than about computers. Obviously, at that time I didn’t think that it was a good idea to listen to users to improve the usability of the PC station. But observing them using it was certainly a good idea.

     

     

  • Some Unexpected Code Dependency Issues

    While supporting NDepend, we got some interesting user feedback about the understanding on how dependencies are inferred. As shown below, sometime, the Dependency Matrix displays some cells weighted with 0. The first reaction is, what the hell is a dependency of weight 0?

     

    On this screenshot, the Weight on Cells option is set to: Direct: # members. It means that if N members of a type/namespace/assembly are used by M methods of another type/namespace/assembly, then the corresponding blue cell will have a weight of N and its symmetric green cell a weight of M.

    If you look carefully at the Information Panel while pointing the 0's cell with the mouse, NDepend tells that there are no members of NamespaceUser4 using any members of NamespaceUsed, but still, there is a dependency between the 2 namespaces. This situation is unexpected! The explanation is that a type of the namespace NamespaceUser4 is using a type of the namespace NamespaceUsed without using any members. This is actually possible in many situations exposed in the code excerpt below, by passing an argument, declaring a field, implementing an interface, closing a generic type but also by tagging with an attribute, catching an exception...

    If we switch the option Weight on Cells to Direct: # types, there is no more 0's cell and we can see that some types are indeed using some others. Actually, when Weight on Cells is set to something else than Direct: # members/methods/fields it is not possible to have 0's cells because  the way the Common Type System is made, it is not possible to have a dependency not involving some types.

     

     

     

    Another Issue with const values

    Sometime, NDepend won't report something that user considers as a dependency between 2 code elements. This case can happen when there are some enumerations values or some const fields. For example::

    This screenshot shows that NDepend didn't infer a dependency between the namespaces!


    It is actually not a bug but a consequence of a C#/VB.NET compiler optimization. I quote the master Jeffrey Richer from his excellent book CLR via C# page 177:

    (...)When code refers to a constant symbol, compilers look up the symbol in the metadata of the assembly that defines the constant, extract the constant's value, and embed the value in the emitted IL code. (...) 

    It means that by looking at the IL code, one cannot infer the dependency between a method and a constant used. Moreover, it is worth mentioning that enumeration's values are compiled as constant. Althought NDepend parses source code to get some metrics about comments for example, the bulk of information gathered from the code base comes from the IL. This behavior has many appealing advantages, the analysis is not impacted by the way code is formatted neither by special language syntax peculiarities, data obtained from source written in different languages can be compared, one can check that a tier library code base abides by dozens of quality tenets even without having access to source files, NDepend can collaborate with IL focused tools like Reflector but can still collaborates with source code focused tools like Visual Studio...

     

    Concerning this const compiler optimization, I would like to precise that it can be a source of major problems. Let's quote Jeffrey again:

    (...)These constraints also mean that constants don't have a good cross-assembly versioning story, so you should use them only when you know that the value of a symbol will never change(...)

    What Jeffrey means is that if assembly A is using some constants of assembly B, if constant changes in B and B is recompiled, if A is not recompiled it will still contains the old constant values.

     

    Finally, this const compiler optimization has been a source of complexity while developing the NDepend assembly comparison feature. Imagine that 1000 methods are using values of an enumeration (which is a real-world case). If for some reasons values of the enumeration are changing, do you want to be advised that 1000 methods have changed? Indeed, the IL code of the methods have been updated with the new values but still, the C# or VB.NET source code of these methods haven't changed. Hopefully we made our algorithms a bit smart and NDepend can detect this case and not report it to the user.
     

  • Geek Nostalgia

    Scott Hanselman just wrote an excellent post with a photo when he was a child geek.

    I cannot resist following him Smile
    This 1985 photo gives a good idea of the little nerdy I was. Second row from the bottom, 4th from the right.



    I was ten and I was playing with my first computer, an Exeltel (a french computer)! At that time my main programming activity was to tweak strings and redefine the character set, all this in real-time, to do the games my computer didn't have, like the so cool Sega Space Harrier. Without the 2x68000 the arcade machine had, I needed a solid imagination to enjoy playing with my own version. The same year, our school got a dozens of Thomson Mo5 and it was fun teaching the teacher how it worked. A few years later, the next step was to code 3D demos for Commodore Amiga. Everything was programmed with 68000 assembly and had the constraint to display 50 frames per seconds.

    Everyday my mom was angry telling me computers won't help going anywhere in life Big Smile Hopefully my friends and parents made me discover also the pleasure of socialization and outdoor activities.

    Since then, my only break in programming was when I did a few years of advanced mathematics classes. I really love maths, especially theory of numbers, but didn't imagine what to do with it except teaching. I didn't realize all the potential of being a financial trader. So I chosed to quit maths and go back to software. And since recently, I think my choice of being a programmer instead of being a trader, was not a too bad choice. Especially that for more than 20 years every programming day is playtime Smile.


     

  • Comparing Silverlight and the .NET Framework

    A recent NDepend v2.10.2 feature is the analysis of Silverlight application. Silverlight recently went from beta to 2.0 Release Candidate 0 . In this blog post I will first compare Silverlight 2.0 RC0 and .NET Framework v3.5 SP1 assemblies. I will then compare Silverlight 2.0 RC0 and Silverlight 2.0 beta assemblies. To do so, I’ll use the Build Comparison feature of NDepend.

     

     

    Silverlight 2.0 RC0 vs. .NET Framework v3.5 SP1

     

    Let’s notice first that only the following assemblies can be compared: mscorlib, System, System.Core, System.Net, System.Runtime.Serialization, System.ServiceModel, System.ServiceModel.Web, System.Xml. While the .NET Framework has numerous other assemblies not supported by Silverlight, Silverlight comes with only 2 assemblies not present in the .NET Framework: System.Windows.Browser and System.Windows. Another detail: the assembly System.dll had been renamed (I think by mistake) system.dll in Silverlight?!

     

    SELECT TYPES WHERE IsPublic AND WasAdded

    Silverlight has 44 new public types. 41 of them are in the assembly System.Net, and the public class System.Xml.XmlXapResolver and the 2 enumerations System.Xml.DtdProcessing, System.Xml.NamespaceHandling are in the assembly System.Xml. Interestingly enough, it is easy to see that these System.Net types are in the assembly System of the regular .NET Framework. So the decision has been to move these types.

     

    SELECT METHODS WHERE WasRemoved

    By visualizing the .NET Framework methods removed in Silverlight with the NDepend metric/treemap view, it is really obvious that Silverlight is a mini-mini-.NET Framework (methods removed are in blue).

     

     


    For the concerned assemblies, 8 129 types on 9 989 have been removed. It is even more impressing in terms of methods, 72 515 methods on 90 574 have been removed. In terms of IL instructions, 1 607 974 IL instructions on 2 046 294 have been removed, around 78.5%!. It is interesting to list the 100 namespaces that have been completely discarded in Silverlight.

    SELECT NAMESPACES WHERE WasRemoved ORDER BY NbTypes DESC, NbILInstructions DESC

     

    Concerning dependencies, Microsoft did the work of removing many dependencies to .NET Framework assemblies not part of the Silverlight release. This can be shown with the following graph, where .NET Framework 3.5 SP1small assemblies in yellow are considered by NDepend as tier assemblies, because they are indeed not part of the list of assemblies chosen.

     

    Concerning Silverlight internal dependencies, the following Dependency Matrix tells us that hopefully the Silverlight assemblies are layered, which is not the case of the regular .NET Framework. For example in the regular .NET Framework mscorlib.dll and System.dll are mutually dependent. Notice that we infer from the following matrix that there are no cycles between assemblies of Silverlight, because the matrix is triangularized. Moreover, cells with a red tick represent dependencies between assemblies that have been changed (basically all of them). Cells with a red tick and a plus/minus, dependency represent dependencies between assemblies that have been created/removed especially for Silverlight.



    Here is the same set of Silverlight internal dependencies represented with a graph. Here, we infer from the graph that there are no cycles between assemblies of Silverlight, because the graph is perfectly layered from top to bottom (i.e there are no rows that goes from bottom to top).

     


    Silverlight 2.0 RC0 vs. Silverlight 2.0 beta

    The delta between Silverlight beta and RC0 is significant.

     

    SELECT TYPES WHERE IsPublic AND WasAdded

    73 public types have been added (a lot of controls).

     

    SELECT TYPES WHERE IsPublic AND WasRemoved

    11 public types have been removed.

     

    By looking at the metric view for methods added or refactored, it is pretty clear that a lot of work has been done on controls in System.Window (methods added or refactored are in blue):

    SELECT METHODS WHERE CodeWasChanged OR WasAdded


     

    Analyzing your Silverlight application with NDepend

     

    To make possible Silverlight assemblies analysis, we needed to tweak a bit how NDepend resolves .NET Framework assemblies. Indeed, not only NDepend analyzes the code of your application, but also the code used by your application, what we call Tiers Code. And the .NET Framework code (mscorlib, System.Core…) is by-design tier code for all .NET applications since all .NET Assemblies references mscorlib.dll (except mscorlib.dll itself Smile).

     

    As shown below, we added the possibility to choose the .NET Framework targeted. There is no magic behind this, but just the modification of the list of folders that contain the .NET Framework assemblies.


     

    As you can see on the screenshot above, NDepend references Silverlight beta assemblies version 2.0.30523.8. Silverlight RC0 has the version number 2.0.30923.0 and you then need to update the version number as shown below. Of course the next version of NDepend will be updated.

     


     

More Posts

Our Sponsors