How to detect the text encoding of a file
Posted by Sergio
on 2010-01-26
Today I needed a way to identify ANSI (Windows-1252) and UTF-8 files in a directory filled with files of
these two types. I was surprised to not find a simple way of doing this via a property of method somewhere
under the System.IO
namespace.
Not that it's that hard to identify the encoding programmatically, but it's always better when you don't need to write a method yourself. Anyway, here's what I came up with. It detects UTF-8 encoding based on the encoding signature added to the beginning of the file.
The code below is specific to UTF-8 but shouldn't be too hard to extend the example to detect more encodings.
public static bool IsUtf8(string fname){ using(var f = File.Open(fname, FileMode.Open)){ var sig = new byte[Encoding.UTF8.GetPreamble().Length]; f.Read(sig, 0, sig.Length); return sig.SequenceEqual(Encoding.UTF8.GetPreamble()); } }
Maybe I just looked in the wrong places. Does anyone know a simpler way in the framework to accomplish this?