This article is quoted from http://www.sparkhound.com/blog/detect-image-file-types-through-byte-arrays
Have you ever needed to know the file type of an image? Did you know that certain image file types when read as a byte array have the same collection of bytes every time? Instead of checking for an image type by looking for the extension in the file name it’s best to look at the byte array making up the file and compare it to known image type byte arrays.
I was working on a project where I needed to download images from a url and store them on a file system. The downloaded images would change daily and could be any number of file types. In fact, I didn’t even know if the file I was downloading was actually going to be an image. I only needed images, but it was possible that the web call could send back something else like a web page or different file type all together. I only wanted to save the file if it was an image type and if the image type was one of the common ones that I expected. (.png, .jpeg, .bmp, etc) After looking online for a solution to detect an image file type I came across a very informative post on Stack Overflowwhere someone else was trying to do the same thing. The answer that made the most sense to me was pointed out that a lot of image types have specific sets of bytes at the beginning of the file type to denote which type of file it is.
For example, in C#:
var bmp = Encoding.ASCII.GetBytes("BM"); // BMP
var gif = Encoding.ASCII.GetBytes("GIF"); // GIF
var png = new byte[] { 137, 80, 78, 71 }; // PNG
var tiff = new byte[] { 73, 73, 42 }; // TIFF
var tiff2 = new byte[] { 77, 77, 42 }; // TIFF
var jpeg = new byte[] { 255, 216, 255, 224 }; // jpeg
var jpeg2 = new byte[] { 255, 216, 255, 225 }; // jpeg canon
The post referenced pretty much all of the image file types that I wanted to use. Anything jpeg, png, gif, or bmp was exactly what I was expecting to see. This allowed me to pass in a stream from the url and compare it to any of these byte arrays and detect the file type. This also enabled me to download and save the files locally with the correct extension. In the post, KevanTTT’s answer was based off another post’s answerbut KevanTTT modified the solution to use a stream rather than only byte arrays. It isn’t a huge change but I thought I should credit both posts. After knowing that each image file type’s initial bytes are the same per file type it makes it easy to work with any number of image file types.
This information is not life changing information but it makes sense as all file types are denoted in some way or another and probably by more than just writing a file extension at the end of a file name. I’m happy that I came across it and I hope it benefits you in the future!
If you enjoy this topic or enjoy talking about development of any kind you should check out our available positions at Sparkhound. Sparkhound is full of people with aligned interests and motivation to provide the best possible solution to any scenario. There are plenty of great minds to lean on and we like to have fun too! Feel free to contact me at sam.north@sparkhound.com and/or contact Sparkhound for any further discussions, questions, or feedback. Woot!
Sam