CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

Comparing Hash Table Implementations Performances

 
One of my preferred programming paradigm is hash table. I consider almost like magic being able to test if a collection contains a particular object, independently from the size of the collection (hence the term constant time often used to characterize hash tables). Personally, I use hash tables intensively and it is the cornerstone of all VisualNDepend and Code Query Language (CQL) implementation performances. Performances are here critical since the goal of NDepend is to let .NET developers knows anything about their code bases in real-time, even on millions lines of code application that spawns on multiple Visual Studio solutions (anything about their code bases include architecture/dependencies/layering facts, 82 code metric values, changes between 2 versions, state mutability facts, encapsulation facts...).

 

I won’t go through all the details of hashtable here (load factor/collision/bucket/hash computation/hash on immutable state/caveat etc…). Jeff Atwood did it very well in his post Hashtables, Pigeonholes, and Birthday . Frankly, if you are not sure about hash table concepts or if you don’t use them on a daily basis, go read this post and understand why you are about to use hash tables extensively from now. Also, if you don’t believe me that hash tables are so central for good programming, then believe Microsoft engineers that provided the method Objet.GetHashCode() meaning that the hash concept applies for all .NET objects of all time.

 

I was curious about comparing the performances of the different implementations of hash table .NET developers. My motivation was to see if it was worth switching from the .NET framework hash table implementations to some other ones.

 

Basically there are the 2 classes in the .NET framework System.Collections.Generic.Dictionary<K,V> , System.Collections.Generic.HashSet<K> (I don't count the non-generic System.Collection.HashTable). There are also the implementations available in the 3 libraries C5 (by IT University of Copenhagen), PowerCollections  (by Wintellect) and Mono (by Novell). With the exception that PowerCollections doesn’t seem to provide its own Hash Dictionary.

 

Find below the detailed results. The conclusion is that the .NET Framework implementations are a bit more cheaper than the Mono ones, while the C5 and PowerCollections implementations are behind. Basically .NET implementations are between 1.4 to 1.7 more performant on insertion than the Mono ones. Both are equivalent when it comes to test containment in HashSet<T> and the Mono implementation of Dictionary<K,V>.TryGetValue(...) is a tiny bit better than the .NET one. One has to consider that the big motivation for using hash tables is when there are a lot of containment tests/retrieval, since these operations are done in a constant time. Thus, the Mono flaw on insertion is not that serious and this is a good news for all of us that don't have access to the class System.Collections.Generic.HashSet<K> because we are still running on .NET v2.0. We can just use the Mono HashSet<T> implementation.

 

Concerning memory , if you look at results you'll see that Mono collection seems up to  2 times cheaper that .NET ones but this results are biased. Indeed, some arrays objects are maintained and I cannot distinguish who reference what. I tried to do the memory tests separately but internally some arrays are maintained by the CLR ans it is still not possible to infer a clear result.

 

Here is the C# source file used to do the tests. Notice that I prefixed Mono collections with Mono. to avoid collision with .NET Framework collections. All hash table tested have some string generic parameters, and we also did the test with int32 values, to get also a glimpse at how it behaves with value type. If you have feedback on these testing methodology please put them in comment. I know that the biggest concerns is the fact that a single insertion in an hash table can sometime takes a lot of time because internally it sometime triggers a re-arrangement of keys proportional to the number of elements. The only way to shorten these unwanted effects was to do tests for several values.

 

The results for both performance and memory profiling were obtained with the great JetBrain dotTrace profiler. Concerning memory profiling tool I have to say that usually I use the Scitech .NET Memory Profiler which has some awesome real-time profiling capabilities without almost any overhead. I choosed dotTrace here because the result presentation is nicer and consistent with the performance result I wanted to present.

Concerning performance profiling Red-Gate recently released .NET Performance Profiler v4.1 that has some great visualization features but honestly, I haven’t found the time yet to dig in it and see if I could prefer it over my good friend dotTrace.


 

 

 

Hash Table: Performance Profiling Results

 

Performance for 1M strings.

 

 

Performance for 100K strings.

 

 

Performance for 10K strings.

 

 

Performance for 1000 int32.


 

Performance for 10K int32.

 

 

 

Hash Table: Memory Profiling

 

Memory for 100 elements

 

 

Memory for 1000 elements

 

 

Memory for 10M elements

 

 

Memory for 100M elements

 

 

Memory for 1000M elements

 

 

 


 



Comments

Reflective Perspective - Chris Alcock » The Morning Brew #212 said:

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #212

# October 30, 2008 4:31 AM

gOODiDEA said:

Web网页栅格系统研究(1):960的秘密-(2):蛋糕的切法-(3):粒度问题12StepsToFasterWebPagesWithVisualRoundTrip...

# October 31, 2008 12:57 AM

Community Blogs said:

Web 网页栅格系统研究(1):960的秘密 - (2):蛋糕的切法 - (3):粒度问题 12 Steps To Faster Web Pages With Visual Round Trip Analyzer

# October 31, 2008 12:59 AM

Patrick Smacchia [MVP C#] said:

I am glad to announce that we just released a new version of NDepend where the analysis phase duration

# December 1, 2008 4:32 AM

Community Blogs said:

I am glad to announce that we just released a new version of NDepend where the analysis phase duration

# December 1, 2008 4:55 AM

Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas said:

Pingback from  Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas

# December 2, 2008 10:16 AM

Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas said:

Pingback from  Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas

# December 3, 2008 11:03 AM

Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas said:

Pingback from  Lessons learned from a real-world focus on performance - taccato! trend tracker, cool hunting, new business ideas

# December 4, 2008 11:09 AM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add

About Patrick Smacchia

Patrick Smacchia is a Visual C# MVP involved in software development for over 15 years. After graduating in mathematics and computer science, he has worked on software in a variety of fields including stock exchange, airline ticket reservation system as well as a satellite base station at Alcatel. He's currently a software consultant and trainer on .NET technologies as well as the lead developer of the tool NDepend which provides numerous metrics and caveats on any compiled .NET application. He is the author of Practical .NET2 and C#2, a .NET book conceived from real world experience with 647 compilable code listings. Check out Devlicio.us!

Our Sponsors

Proudly Partnered With