CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

January 2008 - Posts

  • Dependencies and Concerns

    Jim Bolla, a contributor to the NHibernate project, had a surprising remark while describing what NDepend has to say about the NHibernate code on its blog: 

    (with NDepend) as an example, you can do stuff like this..

    WARN IF Count > 0 IN SELECT METHODS WHERE IsDirectlyUsing "System.Web" AND IsDirectlyUsing "System.Data.SqlClient"

    And you will get a result set of any methods in your project that are doing web stuff and SQL stuff in the same method! COOL! Now you know which n00b programmers you need to go slap mentor. In other words, you can define NDepend CQL queries to find methods that violate the Separation of Concerns principle.

     

    The cool part is that using CQL constraints to enforce Separation of Concerns principle is something that we (the NDepend team) didn’t think about. Basically, tell me what you use and I can tell you what you are concerned about. This is a great idea! CQL comes with the direct dependencies conditions: IsDirectlyUsing / IsDirectlyUsedBy. For example:

    // Which methods are directly calling a particular method
    SELECT METHODS WHERE IsDirectlyUsing "MyNamespace.MyType.MyMethod()"

     

    These conditions works on Assemblies/Namespaces/Types/Methods/Fields (ANTMF) and you can mix domains for example:

    // Which assemblies are directly using a particular method?
    SELECT
    ASSEMBLIES WHERE IsDirectlyUsing "MyNamespace.MyType.MyMethod()" 

     

    // Which types are directly used by a particular assembly?
    SELECT TYPES WHERE IsDirectlyUsedBy "MyAssembly"


    // Which namespaces are directly using a particular field?
    SELECT NAMESPACES WHERE IsDirectlyUsing "MyNamespace.MyType.m_Field"

     

    You can mix these conditions with all CQL facilities for example:

    SELECT TYPES FROM ASSEMBLIES "MyAssemblyA" WHERE IsDirectlyUsing "MyAssemblyB"

     

    Or:

    SELECT METHODS OUT OF NAMESPACES "MyNamespace" WHERE IsDirectlyUsing "MyType"

     

    As pointed Jim, you can enforce some separation of concerns by mixing several direct dependencies conditions in the same constraint. Alternatively you can also make sure that an API is used as it should be. Often there are situations where doing something implies doing another things. For example this constraint warns if a method tries to delete a file but is not using the class IOException:

    WARN IF Count > 0 IN SELECT METHODS WHERE
    IsDirectlyUsing
    "System.IO.File.Delete(String)" AND !IsDirectlyUsing "System.IO.IOException"

     

    You can also readily write any kind of constraint to force the use of interfaces instead of classes:

    WARN IF Count > 0 IN SELECT METHODS WHERE
    IsDirectlyUsing
    "MyNamespace.MyType" AND !IsDirectlyUsing "MyNamespace.IMyInterface"

     

    The following constraints warns when a method obtains a Graphics from an image and doesn’t call the Dispose() method:

    WARN IF Count > 0 IN SELECT METHODS WHERE
    IsDirectlyUsing
    "System.Drawing.Graphics.FromImage(Image)" AND !IsDirectlyUsing "System.IDisposable.Dispose()"

     

     

    If you are developing a framework, you can build a set of such constraints that your clients should applies to make sure that they are using your framework correctly.

     

    Another use of mutiple IsDirectlyUsing conditions is to guess where an API should be used. For example, this query matches methods that are changing some states and that are not using any synchronization API:

    SELECT METHODS WHERE !(
       IsDirectlyUsing
    "System.Threading.Mutex" OR
       IsDirectlyUsing
    "System.Threading.Interlocked" OR
       IsDirectlyUsing
    "System.Threading.Monitor" OR
       IsDirectlyUsing
    "System.Threading.ReaderWriterLock") AND
    (
    ChangesObjectState OR ChangesTypeState) AND
    !(
    IsConstructor OR IsClassConstructor)

     

    Another idea is to make sure that the layer represented by the code in the namespace MyNamespace  is not entangled with any other namespaces (i.e has no bi-directional dependency with any other namespace):

    WARN IF Count > 0 IN SELECT NAMESPACES WHERE
    IsDirectlyUsing
    "MyNamespace" AND IsDirectlyUsedBy "MyNamespace"

     

    I will certainly post more on these. Thanks to its flexibility CQL offers a wide range of possibilities to explore!

  • Avoid API breaking changes

     

    If you are developing a framework, the last thing you want to happen when releasing a new version of your product is to break the code of your clients because of an API change. For example, you want to make sure that all public methods you had in previous versions are here in the next version, unless you tagged some of them with System.ObsoleteAttribute.

    There is one major company in the .NET sphere that publishes a big framework and that has this need: Microsoft. To make sure that the .NET framework public API doesn’t contain breaking changes, Microsoft developed the tool LibCheck that detects automatically API breaking change.

    The tool NDepend can also be used to detect API breaking changes thanks to some astute CQL rules. The following CQL rule warns if a public method of the previous version is not public anymore or has been removed in the new version.

    WARN IF Count > 0 IN SELECT METHODS WHERE
    IsInOlderBuild AND IsPublic AND (VisibilityWasChanged OR WasRemoved)

     

    The condition IsInOlderBuild needs a little explanation. When comparing 2 versions of a code base, NDepend stores in memory the 2 code base structures, the older one and the newer one. This is why the comparison is immediate, everything is in-memory. Also, in this context, NDepend has to choose on which structure the query will be run. The condition IsInOlderBuild forces NDepend to run the query on the older build. Actually we don’t need the condition IsInOlderBuild here because when a query contains the condition WasRemoved, NDepend automatically runs the query against the older build. However, I estimate that adding IsInOlderBuild makes the query easier to read and understand.

    The query can be read: Warn if there are methods that used to be public and for which visibility was changed or have been removed. Interestingly enough, we notice that if the visibility used to be public and has changed, it means that the visibility is not public anymore, hence the API breaking issue. This is also a good example of mixing ortogonal CQL features (here comparison and visibility) in the same query to obtain smarter queries. I will blog more on this in the future, this will be an important direction for the product in 2008.


    Eventually, we could also add the CQL condition IsObsolete to make sure that the query doesn’t match obsolete methods that have been removed.

    WARN IF Count > 0 IN SELECT METHODS WHERE
    IsInOlderBuild AND IsPublic AND !IsObsolete AND (VisibilityWasChanged OR WasRemoved)


    In the same spirit, it is easy to write the following CQL rule that detects public types of the previous version that are not public anymore or that has been removed in the new version:

    WARN IF Count > 0 IN SELECT TYPES WHERE
    IsInOlderBuild AND IsPublic AND (VisibilityWasChanged OR WasRemoved)
     

    Notice that in this previous post I explained how to configure an NDepend project to define the previous version of your code base to compare with during an analysis. Basically you can choose between:

    • Compare with the last analysis available.
    • Compare with the analysis made N days ago.
    • Compare with the particular analysis made on MM/DD/YYYY hh:mm 

    Of course, you can also still use the VisualNDepend GUI to compare any 2 previous analysis.


  • Immutable types: understand their benefits and use them


    There is a powerful and simple concept in programming that I think is really underused: Immutability

    Basically, an object is immutable if its state doesn’t change once the object has been created. Consequently, a class is immutable if its instances are immutable.

    There is one killer argument for using immutable objects: It dramatically simplifies concurrent programming. Think about it, why does writing proper multithreaded programming is a hard task? Because it is hard to synchronize threads accesses to resources (objects or others OS things). Why it is hard to synchronize these accesses? Because it is hard to guarantee that there won’t be race conditions between the multiple write accesses and read accesses done by multiple threads on multiple objects. What if there are no more write accesses? In other words, what if the state of the objects threads are accessing, doesn’t change? There is no more need for synchronization!

    Of course I simplify here so let's dig a bit.

     

    A famous immutable class

    There is one famous immutable class: System.String. When you think that you are modifying a string, you actually create a new string object. Often, we forget about it and we would like to write …

    string str = "foofoo";

    str.Replace("foo", "FOO");

     

    …where we need to write instead:

    str = str.Replace("foo", "FOO");

    Of course, doing so comes at the cost of creating multiple string objects in memory when doing some intensive string computation. In this case you need to use the System.Text.StringBuilder class that provides a safe way to work with mutable string.

    Actually, string objects are not that immutable and as far as I know there are at least 2 ways to break string immutability. With pointers as shown by this code example and with some advanced System.Reflection usage.

    So why .NET engineers decided that string should be immutable? Because programmers will never get a race conditions because of a corrupted string. Also because string are well adapted to be key in hashtables (i.e Sytem.Collections.generic.Dictionary<K,V>). Hashtables are almost a magic way to enhance dramatically performance of your code. (I said magic because under the hood hashtables rely on prime numbers properties and prime numbers are magic!). The objects on which the hash values are computed must be immutable to make sure that the hash values will be constant in time. Indeed, hash value is computed from the state of the object (or eventually a sub-state of the object, then only this sub-state must be immutable).

    Another cool thing about string immutability is that even though System.String is a class, string objects get compared with equivalence, as a value type. This is possible because we can consider that the identity of an immutable object is its state. For example:

    string str1 = "foofoo";

    string strFoo = "foo";

    string str2 = strFoo + strFoo;

    // Even thought str1 and str2 reference 2 different objects

    // the following assertion is true.

    Debug.Assert(str1 == str2);

     

    Purity vs. Side-effect

     
    So we now know at least 3 great benefits of immutable objects:

    • They simplify multithreaded programming.
    • They can be used as hashtables key.
    • They simplify state comparison.

    We can now be more general and say that the primary benefit of immutable types come from the fact that they eliminate side-effects. I couldn’t say it better than Wes Dyer so I quote him:

    We all know that generally it is not a good idea to use global variables.  This is basically the extreme of exposing side-effects (the global scope). Many of the programmers who don't use global variables don't realize that the same principles apply to fields, properties, parameters, and variables on a more limited scale: don't mutate them unless you have a good reason.(…)

    One way to increase the reliability of a unit is to eliminate the side-effects. This makes composing and integrating units together much easier and more robust.  Since they are side-effect free, they always work the same no matter the environment.  This is called referential transparency.

     

    Immutable classes in C#

    C# supports immutability thanks to 2 keywords: const and readonly. They are used by the C# compiler to ensure that the state of a field won’t be changed once an object is created. Why 2 keywords? Because the readonly keyword allows state modification within constructor(s) while the const keyword doesn’t. For example:

    class Article {
       Article(string name,int price) {
          m_Name = name; // <- Compilation error
          m_Price = price;
       }
       const string m_Name = "Ballon";
       readonly int m_Price;
    }

    At this point you might wonder what if my object references another object through a read-only fields? Does the state of the referenced object can change? The answer is yes, this is the classical shallow vs. deep paradigm. Eric Lippert from the C# team has a nice post describing these different kinds of immutability. Actually Eric did a great range of posts on how to code efficient immutable collections, which might sound paradox since collections are known as super-mutable objects. Check its posts done since november 2007! Also Eric wrote:

    Immutable data structures are the way of the future in C#.

    So stay tuned!

     

    Immutable closures in C#2

    The functional coding-style is more adapted to immutable state than the imperative one. Wesner Moise often praises on its blog the use of functional closures as a mean to achieve immutability. For example, here is a simple program that defines immutable objects. These objects are affine transformer that take an integer x and that compute a*x+b. Thus, the state of our immutable affine transformer object are the a and b parameters. Here is the code:

    using System.Diagnostics;
    class
    Program {
       delegate int DelegateType(int x);

      
    static DelegateType MakeAffine(int a, int b) {

         
    return delegate(int x) { return a * x + b; };

      
    }

      
    static void Main() {

         
    DelegateType affine1 = MakeAffine(2, 1);

         
    DelegateType affine2 = MakeAffine(3, 4);

         
    Debug.Assert(affine1(5) == 11);  // 2*5+1 == 11

         
    Debug.Assert(affine2(6) == 22);  // 3*6+4 == 22

      
    }

    }

    If you are not acquainted with closure this code might surprise you. Behind your back, the C# compiler has created an immutable class to represent these affine transformer objects! I wrote an article about closure in C# that explains all this.

     

    Immutable anonymous types in C#3 and VB9

    C#3 comes with the interesting anonymous types feature. Anonymous types built by the C#3 compiler are immutable, all fields are private and all properties are read-only (it is always instructive to check by yourself with Reflector).

    var affine = new { A = 3, B = 4 };

    affine.A = 3; // <- Compilation error

    On this post, Tim Ng from the VB.NET team, wrote:

    The motivating factor for driving the immutable anonymous types was because the LINQ APIs used hash tables internally and returning projections of anonymous types that could be modified was a dangerous situation.

    Interestingly enough, the VB team decided that their anonymous types wouldn’t be immutable by default, which means that you can write such things:

    Dim affine = New With {.A = 3, .B = 4}

    affine.A = 5

    However, the VB team added the possibility to specify that a certain property would be read-only thanks to the Key keyword:

    Dim affine = New With {Key .A = 3, .B = 4}

    affine.A = 5    ‘ <- Compilation error

    About the motivation behind having mutable anonymous types in VB Tim Ng wrote:

    …but for Visual Basic, we decided that because we have the ability to late bind on top of anonymous types, making them immutable is unexpected.

     

    Immutable support in NDepend and CQL

    As we saw, immutability is a feature that can be enforced at compile-time. In other words it can be enforced by static analysis tools. Thus, the Code Query Language (CQL) that comes with the static analysis tool NDepend has an IsImmutable condition that applies on types. To know which types of your code base are immutable it is as easy as writing this CQL query:

    SELECT TYPES WHERE IsImmutable

     

    To constraint a particular type MyNamespace.Foo to be immutable you can write this CQL constraint:

    WARN IF Count != 1 IN

    SELECT TYPES WHERE IsImmutable AND FullNameIs "MyNamespace.Foo"

     

    To constraint a range of types used by the class MyNamespace.Foo to be immutable:

    WARN IF Count > 0 IN

    SELECT TYPES WHERE IsUsedBy "MyNamespace.Foo" AND !IsImmutable

     

    To constraint a range of types declared in the namespace MyNamespace to be immutable:

    WARN IF Count > 0 IN

    SELECT TYPES FROM NAMESPACES "MyNamespace " WHERE !IsImmutable

     

    To constraint a range of types tagged with an attribute MyNamespace.MyImmutableAttribute to be immutable:

    WARN IF