Tom’s Blog

January 2, 2010

Using Extension Methods and the Win32 API to Efficiently Enumerate the File System

Filed under: .NET,C#,P/Invoke,Uncategorized — Tom Shelton @ 11:52 am

I’ve seen the question come up a couple of times lately on various forums, asking how to get only the first file from a directory listing in a directory with a large number of files.   This is problematic, because as of .NET 3.5 the System.IO methods that are responsible for enumerating the file system (System.IO.Directory.GetFiles for example) do a complete enumeration before returning.  That means if you just want the first 5 files, GetFiles will still enumerate the entire directory before returning.  Further, wouldn’t it be nice to have methods where you could use LINQ queries on file attributes, etc?  If your paying attention to .NET 4.0, you will notice that there are methods now of the Directory/DirectoryInfo classes that will provide just these options – System.Directory.EnumerateFiles, etc.  These methods provide IEnumerable<> return values rather then the fixed array.  Basically, they enumerate and return each value one at a time.

The good news is that with C# 3.0 – we don’t actually have to wait until .NET 4.0 to have these methods now.  By using a little P/Invoke magic and C# 3.0’s extension method capability, we can write code like this today:

   1: class Program
   2: {
   3:     static void Main ( string[] args )
   4:     {
   5:         string path = Environment.GetFolderPath ( Environment.SpecialFolder.MyDocuments );
   6:         DirectoryInfo root = new DirectoryInfo ( path );
   7:
   8:         Console.WriteLine ( root.EnumerateFiles ().First ().Name );
   9:     }
  10: }

Or even better, we can do stuff like this:

   1: class Program
   2: {
   3:    static void Main ( string[] args )
   4:    {
   5:        string path = Environment.GetFolderPath ( Environment.SpecialFolder.MyDocuments );
   6:        DirectoryInfo root = new DirectoryInfo ( path );
   7:
   8:        // get all doc files created before Nov. 1st 2009
   9:        var fileQuery = from fileInfo in root.EnumerateFiles ( "*.doc" )
  10:                        where fileInfo.CreationTime.Date.CompareTo ( new DateTime ( 2009, 11, 1 ) ) == -1
  11:                        select fileInfo;
  12:
  13:        foreach ( var fileInfo in fileQuery )
  14:            Console.WriteLine ( fileInfo.Name );
  15:    }
  16: }

To handle the actual enumeration of the files we will use the Windows API.  The functions we are interested in are FindFirstFile, FindNextFile, and FindClose.  Here are the Native Win32 calls and classes used by the demo extension methods above:

   1: using System;
   2: using System.Collections.Generic;
   3: using System.Text;
   4: using System.IO;
   5: using System.Runtime.InteropServices;
   6: using Microsoft.Win32.SafeHandles;
   7:
   8: namespace FireAnt.IO
   9: {
  10:     internal static class NativeWin32
  11:     {
  12:         public const int MAX_PATH = 260;
  13:
  14:         /// <summary>
  15:         /// Win32 FILETIME structure.  The win32 documentation says this:
  16:         /// "Contains a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (UTC)."
  17:         /// </summary>
  18:         /// <see cref="http://msdn.microsoft.com/en-us/library/ms724284%28VS.85%29.aspx"/>
  19:         [StructLayout ( LayoutKind.Sequential )]
  20:         public struct FILETIME
  21:         {
  22:             public uint dwLowDateTime;
  23:             public uint dwHighDateTime;
  24:         }
  25:
  26:         /// <summary>
  27:         /// The Win32 find data structure.  The documentation says:
  28:         /// "Contains information about the file that is found by the FindFirstFile, FindFirstFileEx, or FindNextFile function."
  29:         /// </summary>
  30:         /// <see cref="http://msdn.microsoft.com/en-us/library/aa365740%28VS.85%29.aspx"/>
  31:         [StructLayout(LayoutKind.Sequential, CharSet=CharSet.Auto)]
  32:         public struct WIN32_FIND_DATA
  33:         {
  34:             public FileAttributes dwFileAttributes;
  35:             public FILETIME ftCreationTime;
  36:             public FILETIME ftLastAccessTime;
  37:             public FILETIME ftLastWriteTime;
  38:             public uint nFileSizeHigh;
  39:             public uint nFileSizeLow;
  40:             public uint dwReserved0;
  41:             public uint dwReserved1;
  42:
  43:             [MarshalAs(UnmanagedType.ByValTStr, SizeConst=MAX_PATH)]
  44:             public string cFileName;
  45:
  46:             [MarshalAs ( UnmanagedType.ByValTStr, SizeConst=14)]
  47:             public string cAlternateFileName;
  48:         }
  49:
  50:         /// <summary>
  51:         /// Searches a directory for a file or subdirectory with a name that matches a specific name (or partial name if wildcards are used).
  52:         /// </summary>
  53:         /// <param name="lpFileName">The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?). </param>
  54:         /// <param name="lpFindData">A pointer to the WIN32_FIND_DATA structure that receives information about a found file or directory.</param>
  55:         /// <returns>
  56:         /// If the function succeeds, the return value is a search handle used in a subsequent call to FindNextFile or FindClose, and the lpFindFileData parameter contains information about the first file or directory found.
  57:         /// If the function fails or fails to locate files from the search string in the lpFileName parameter, the return value is INVALID_HANDLE_VALUE and the contents of lpFindFileData are indeterminate.
  58:         ///</returns>
  59:         ///<see cref="http://msdn.microsoft.com/en-us/library/aa364418%28VS.85%29.aspx"/>
  60:         [DllImport("kernel32", CharSet=CharSet.Auto, SetLastError=true)]
  61:         public static extern SafeSearchHandle FindFirstFile ( string lpFileName, out WIN32_FIND_DATA lpFindData );
  62:
  63:         /// <summary>
  64:         /// Continues a file search from a previous call to the FindFirstFile or FindFirstFileEx function.
  65:         /// </summary>
  66:         /// <param name="hFindFile">The search handle returned by a previous call to the FindFirstFile or FindFirstFileEx function.</param>
  67:         /// <param name="lpFindData">A pointer to the WIN32_FIND_DATA structure that receives information about the found file or subdirectory.
  68:         /// The structure can be used in subsequent calls to FindNextFile to indicate from which file to continue the search.
  69:         /// </param>
  70:         /// <returns>
  71:         /// If the function succeeds, the return value is nonzero and the lpFindFileData parameter contains information about the next file or directory found.
  72:         /// If the function fails, the return value is zero and the contents of lpFindFileData are indeterminate.
  73:         /// </returns>
  74:         /// <see cref="http://msdn.microsoft.com/en-us/library/aa364428%28VS.85%29.aspx"/>
  75:         [DllImport("kernel32", CharSet=CharSet.Auto, SetLastError=true)]
  76:         public static extern bool FindNextFile ( SafeSearchHandle hFindFile, out WIN32_FIND_DATA lpFindData );
  77:
  78:         /// <summary>
  79:         /// Closes a file search handle opened by the FindFirstFile, FindFirstFileEx, or FindFirstStreamW function.
  80:         /// </summary>
  81:         /// <param name="hFindFile">The file search handle.</param>
  82:         /// <returns>
  83:         /// If the function succeeds, the return value is nonzero.
  84:         /// If the function fails, the return value is zero. 
  85:         /// </returns>
  86:         /// <see cref="http://msdn.microsoft.com/en-us/library/aa364413%28VS.85%29.aspx"/>
  87:         [DllImport("kernel32", SetLastError=true)]
  88:         public static extern bool FindClose ( IntPtr hFindFile );
  89:
  90:         /// <summary>
  91:         /// Class to encapsulate a seach handle returned from FindFirstFile.  Using a wrapper
  92:         /// like this ensures that the handle is properly cleaned up with FindClose.
  93:         /// </summary>
  94:         public class SafeSearchHandle : SafeHandleZeroOrMinusOneIsInvalid
  95:         {
  96:             public SafeSearchHandle () : base ( true ) { }
  97:
  98:             protected override bool ReleaseHandle ()
  99:             {
 100:                 return NativeWin32.FindClose ( base.handle );
 101:             }
 102:         }
 103:     }
 104: }

And here are the actual extension methods:

   1: using System;
   2: using System.Collections.Generic;
   3: using System.Text;
   4: using System.IO;
   5:
   6: namespace FireAnt.IO
   7: {
   8:     /// <summary>
   9:     /// Static class to contain extension methods
  10:     /// </summary>
  11:     public static class FileSystemExtensions
  12:     {
  13:
  14:         public static IEnumerable<DirectoryInfo> EnumerateDirectories ( this DirectoryInfo target )
  15:         {
  16:             return EnumerateDirectories ( target, "*" );
  17:         }
  18:
  19:         public static IEnumerable<DirectoryInfo> EnumerateDirectories ( this DirectoryInfo target, string searchPattern )
  20:         {
  21:             string searchPath = Path.Combine ( target.FullName, searchPattern );
  22:             NativeWin32.WIN32_FIND_DATA findData;
  23:             using (NativeWin32.SafeSearchHandle hFindFile = NativeWin32.FindFirstFile ( searchPath, out findData ))
  24:             {
  25:                 if ( !hFindFile.IsInvalid )
  26:                 {
  27:                     do
  28:                     {
  29:                         if ( ( findData.dwFileAttributes & FileAttributes.Directory ) != 0 && findData.cFileName != "." && findData.cFileName != ".." )
  30:                         {
  31:                             yield return new DirectoryInfo ( Path.Combine ( target.FullName, findData.cFileName ) );
  32:                         }
  33:                     } while ( NativeWin32.FindNextFile ( hFindFile, out findData ) );
  34:                 }
  35:             }
  36:
  37:         }
  38:
  39:         public static IEnumerable<FileInfo> EnumerateFiles ( this DirectoryInfo target )
  40:         {
  41:            return EnumerateFiles ( target, "*" );
  42:         }
  43:
  44:         public static IEnumerable<FileInfo> EnumerateFiles ( this DirectoryInfo target, string searchPattern )
  45:         {
  46:             string searchPath = Path.Combine ( target.FullName, searchPattern );
  47:             NativeWin32.WIN32_FIND_DATA findData;
  48:             using ( NativeWin32.SafeSearchHandle hFindFile = NativeWin32.FindFirstFile ( searchPath, out findData ) )
  49:             {
  50:                 if ( !hFindFile.IsInvalid )
  51:                 {
  52:                     do
  53:                     {
  54:                         if ( ( findData.dwFileAttributes & FileAttributes.Directory ) == 0 && findData.cFileName != "." && findData.cFileName != ".." )
  55:                         {
  56:                             yield return new FileInfo ( Path.Combine ( target.FullName, findData.cFileName ) );
  57:                         }
  58:                     } while ( NativeWin32.FindNextFile ( hFindFile, out findData ) );
  59:                 }
  60:             }
  61:
  62:         }
  63:     }
  64: }

With a bit more work, the full range of file search operations could be added to these extension methods – for instance supporting the SearchOptions overloads.  I hope this demo illustrates the power of Extension methods and C# iterators ( a feature introduced in C# 2.0).

Article Source Code.

Powered by WordPress