File Headers and Magic Numbers

Hey there's some opsec in here

Apr 01, 2023

Hello all. Today’s topic will be on the magic known as file headers. If you’re a paid subscriber, you’ve probably read a handful of posts that utilize features among file headers, but I haven’t really explained them yet. Believe it or not, but this is one of the few things that I learned in college that I’ve been able to apply at work. How wonderful.

Before You Read

Although not required, reading these posts will give you a glimpse into how PE file headers are parsed with the Windows API:

IAT Hooks

Have you ever wondered how your computer magically knows how to run a file?

“Because of its file extension”

Shut up Windows.

This is kind of true, but let’s look at Linux first. There are a bunch of files on Linux that don’t have file extensions. Just look at your /bin folder. So, how does Linux know how to run each file? How does it know the difference between a shell file you saved as “setup” and a PNG file that is saved as “homework”?

The answer is magic numbers.

But, what are they?

Magic numbers are the first few signifying bytes in a file. They are also referred to as file signatures. For example, a PE’s (.exe/.dll/.sys) magic number is 4D 5A. If you don’t believe me, open any .exe in either notepad or a hex editor. The first 2 characters will be “MZ” (the ASCII representation of the bytes).

File signatures are simple to retrieve, and there are databases of common signatures. I prefer this one. But, if you’re running Linux or WSL, you can just run the file command to retrieve information on a file. The command even gives you some extra information like if debug symbols are present or if the program is packed.

Let’s look at another signature. How about JPEG?

A JPEG uses FF D8 as its file signature, but it also has a trailer, meaning it has bytes that signify the end of the file. These are FF D9. This could be useful for steganography, where you can stick some data after the FF D9 bytes in the file, and the image can still be opened normally.

There are also different types of JPEG files, and each of them have their own “sub” signature.

FF D8 FF E0 - Standard JPEG
FF D8 FF E1 - Standard JPEG with Exif metadata
FF D8 FF E2 - Canon Camera Image File Format (CIFF)
FF D8 FF E8 - Still Picture Interchange File Format (SPIFF)

There are also a few other sub-sub-signatures, but let’s not worry too much about those.

A keyword here is metadata. Metadata is just “meta data”. It is the data. Basically, metadata is just a set of bytes within the file (usually in the header), that displays information on the file, which can be anything depending on what kind of file it is.

Here’s an example of Exif metadata. This is a saved screenshot of AC Syndicate I happened to have on my laptop. I also used ExifTool to get this information.

There’s a lot of stuff here. We see, of course, the file’s name, location, size, modify/creation data, permissions, etc., which could absolutely be from the OS. But we also see X/Y resolution, image width and height, encoding, color components, and megapixels. That definitely wouldn’t be provided by the computer, so it must be found in the image’s metadata.

In Windows, you can also extract this information by right clicking the file, selecting Properties, and clicking the Details tab.

Depending on the type of camera used, there can be a lot of different metadata, including the camera make and model, focal length, and even geolocation.

Here’s that Assassin’s Creed JPG again, only it’s slightly different.

I really hope Substack doesn’t convert this to a .webp. If it does, go ahead and pretend that this worked exactly like I said it would.

Assuming that’s still a JPG, go ahead and download it and open it up in Windows photos viewer. Then press Alt+Enter to view the image information (or run Exiftool if you aren’t running Windows).

This is the picture information tab, and there appears to be a map here. You can click on the map and it will open up a full scale map with the geolocation of the JPG.

Turns out this screenshot was taken in Nouakchott, Mauritania. What an interesting location. (I set this myself by the way).

This brings me to the opsec part of this post.

Your phone can geotag any photos that you take.

Now that I have successfully scared you, you most likely have nothing to worry about. From what I can tell, this is disabled by default on my phone (I use a Samsung). At least, I think it is. I might’ve disabled it years ago without realizing. Anyways, I can’t find any geolocation on any camera photos on my phone from anywhere, even photos that weren’t taken by my phone. So, most likely, you’re OK. But, you should double check that setting just to make sure, especially if you want to be anonymous like the Bow Tied Crawfish.

Let’s shift gears a bit and focus more on metadata and file headers. If you reverse engineer and hack stuff, you’ll find yourself parsing (namely executable) file headers quite frequently. Windows does provide a decent amount of structures and API for parsing PE headers, so we will explore those a bit more in depth briefly.

This is copyrighted I think so uh https://www.researchgate.net/figure/A-general-layout-of-PE-file-depicting-members-of-the-PE-Header-and-PE-Optional-Header_fig1_322350142

Remember MZ from earlier? Yea, we’re looking at that thing again. A great resource on it is this page (great website in general).

The TLDR is:

The DOS header is usually 128 bytes of executable code that goes “hey this isn’t a DOS program”.
After the DOS header is the PE or NT header which can be found at base address + dos::e_lfanew
Within the NT header is the PE signature (another one), the file header, and the optional header
In the file header lies the type of machine (x86/x64/ARM/etc.), number of sections in the program (.text/.rdata/.pdata/etc.), time stamp of creation, pointer to the symbol table and how many symbols are in it, size of the optional header, and various bitflag characteristics of the file.
In the optional header lies a lot of stuff. The main things that are really important are the base of the image, entry point address, section and file alignment, size of the image, size of the headers, file checksum, subsystem, all 16 data directories, and more characteristic bitflags. And more!

All of this information is shoved into the header for you to retrieve! In C++, winnt.h (which is automatically included in Windows.h) contains just about all of the structures, defines, and macros for parsing a PE header. If you want to see some example usage, IAT Hooks parses a PE header to find the import table, which is a data directory, and the parses that to find MessageBoxA.

That’s all for this post. I figured it would be a good idea to discuss this topic as it’s pretty relevant in both forensics and reverse engineering. Anyways, have a great weekend.

Go!

-BowTiedCrawfish.

Shellfish Systems and Security

Discussion about this post