CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]


The File Carnie

 

Well if I were working in a carnival and this thing stepped up to the booth to have me guess its weight, it would have won the coveted stuffed animal for its dollar. 

This is really obvious though it might be more of an optimization than your think, caught me by surprise anyways I was expecting a 7.5-15% gain from it which is why I had put it off as a "minor" optimization to do when I had a few free minutes.

 

If you have a guess at the size of the file you would like to write, size it before hand truncate after if need be ... its pretty much free.

namespace BigFileWriterTest
{
    class Program
    {
        static FileStream CreateAndOpenFile(string _Name)
        {
            FileStream ret = File.Open(_Name, FileMode.Create);
            return ret;
        }
        static void TestFile(string _Name, bool _InitSize, int Megs) {
            Stopwatch sw = new Stopwatch();
            byte[] data = new byte[1024]; //1 kb buffer
            for (int i = 0; i < data.Length; i++)
            {
                dataIdea = (byte) (i % 255);
            }
            sw.Start();
            using (FileStream fs = CreateAndOpenFile(_Name))
            {
                sw.Stop();
                Console.WriteLine("Open: " + sw.Elapsed);
                sw.Reset();
                sw.Start();
                if (_InitSize)
                {
                    long length = 1024;
                    length = length * 1024;
                    length = length * Megs;
                    fs.SetLength(length);
                }
                sw.Stop();
                Console.WriteLine ("Length Increase: " + sw.Elapsed);
                sw.Reset();
                sw.Start();
                for (int i = 0; i < (Megs * 1024); i++)
                {
                    fs.Write (data, 0, data.Length);
                    fs.Flush();
                }
                sw.Stop();
                Console.WriteLine("Write: " + sw.Elapsed);
                fs.Close();
            }
            File.Delete(_Name);
        }
        static void Main(string[] args)
        {
            string Filename = "C:\\shitbird.tmp";
            TestFile(Filename, false, 1000);
            TestFile(Filename, true, 1000); 
        }
    }

Output: 

 

Open: 00:00:00.0013021
Length Increase: 00:00:00.0000013
Write: 00:01:17.4923097
Open: 00:00:00.0013376
Length Increase: 00:00:00.0220189
Write: 00:00:13.8646638
Press any key to continue . . .

WOW!

 77
---- = almost 6 times faster?!
 13

Go try it on yours and show your boss how you made that big dump file 5x faster, it should at least get you a beer.

p.s. no comments on my temp file names :) they are really easy to find/delete.



Comments

JD Conley said:

Wow! Who woulda thought. This actually just came in handy. We happen to be parsing / transforming files measured in the 100gb range.... I didn't measure it but seat-of-the-pants says it is waay faster at writing now with the SetLength.
# August 17, 2007 2:46 PM

Peter Ritchie said:

Yeah, that's an old Win32 trick: essentially seek to a position in the file of the length you want and write a byte or call SetEndOfFile [2].  .NET uses SetFilePointer [1] to do the seek.

[1] msdn2.microsoft.com/.../aa365541.aspx

[2] msdn2.microsoft.com/.../aa365531.aspx

# August 17, 2007 3:00 PM

Greg said:

Yeah Peter its well known I just never would have guessed the huge amount of difference ... I was expecting the block allocation cost to be <10% of the operation... boy was I wrong :)

The other great thing about this is you know you won't run out of disk during your operation :) 

# August 17, 2007 3:07 PM

Peter Ritchie said:

Essentially with the seek method you avoid having to write all those bytes and the system just gives you the space (and whatever data is contained therein, I guess). There's all sorts of disk-space-eating programs that make use of this to quickly simulate an empty/low-space disk... I think it's pretty constant too, if you open a file and seek to 2GB and call SetEndOfFile, it will be about the same amount of time as if you had done it with 1k.
# August 17, 2007 4:19 PM

Peter Ritchie said:

BTW, it's a handy way to reserve space; create a huge file and seek around using SetEndOfFile to "free" up some disk space for your application for some other file. It avoids race conditions too, open a file, seek to the amount of space you know you need, call SetEndOfFile, seek back to the beginning and start writing. You know you'll never get an out-of-space error while writing. Where if you check for space before you start writing you have a race condition where something else could have used up the space...which is what I think is it's intention and reason for the speed.
# August 17, 2007 4:21 PM

Justin said:

I wrote a similar but much more detailed series of blog posts about this last year, wherein I compare various ways of implementing this across multiple languages and machines. (VB.NET, C++.NET, 3 different C++, and 2 different Java implementations. http://justin-michel.spaces.live.com/blog/cns!AE9441BAE91063CC!139.entry
# August 18, 2007 5:23 PM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add
Check out Devlicio.us!

Our Sponsors

Free Tech Publications