Hi. Today we will be discussing some techniques to shrink the footprint you create when writing programs. Plus, we’ll be nuking the header.
Okay, maybe we won’t be completely remove it, but instead we’ll be stripping out as much information from the header as possible. Sounds fun, right?
Before You Read
It is recommended to read these related posts before reading this one, so that you don’t get lost during the discussion of the concepts:
Not written by me, but this post by Pavel Yosifovich is a great example of features that will be deployed in this post.
What
Hopefully, you already know what a PE header is. If you don’t, read this post. Read it? Good. Now we can start.
The PE header holds a lot of information. This information can be used to help reverse engineers, malware analysts, threat intelligence officers, etc. get a glimpse as to what the program might possibly do. This was heavily implied in the Triage post. But, what if the PE header didn’t provide any useful information? That would stink. (Un)fortunately, you can strip a lot of information out of the PE header. This information includes:
Imports (IAT)
Exports (EAT)
Debugging information
Exception handling information
Compile time stamps
… And more
Why
Stripping out this data helps thwart analysts. If you’re a bad guy, you don’t want to leave your fingerprints everywhere, so removing some basic information is a good idea. A bonus is that you shrink the size of your executable as well, which may or may not be a good thing.
How
Part 1 - MSVC
Let’s compile a simple hello world program.
#include <iostream>
int main()
{
std::cout << "Hello World!" << std::endl;
}
If we compile this basic program with default compilation options, we get…
A lot.
It imports 84 functions from kernel32.dll, and the application is 232kb. That’s really big and obnoxious. We need to optimize this.
But why is it so big? The main reason for that is because the entire C Runtime Library (CRT) is statically linked to the program. This essentially means that a tremendous amount of unused code is just sitting in the program. We could have it dynamically linked, but if we want to do away with imports, then that’s no good either.
But, what is the CRT? The CRT is the “runtime library” that provides a lot of C/C++ features that we take for granted. These include, reading/writing to/from streams (stdin/stdout), reading/writing to/from files, dynamic initializers, the new
operator, even the main() function! That’s a lot.
The thing is, the CRT just wraps over the Windows API (or whatever OS it’s running on). So instead, we can just call the corresponding Windows API to perform the same functionality.
Secondly, we’re also going to use the full scale of MSVC and do this in Visual Studio. This provides us with various optimizations at the code generation and linking level to help shrink stuff we don’t want. If you’re following along, don’t forget to set the configuration to Release mode.
Let’s go back to our original program and rewrite it Windows-style.
#include <Windows.h>
int main()
{
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE), "Hello World!\n", 13, NULL, NULL);
}
This shrunk us down to 11kb (a 95% decrease) and removed some imported functions. But, unfortunately, there are still a few references to the CRT. We need to get rid of these entirely. Let’s just skip ahead and nuke everything and minimize as much used space as possible.
Assure that the project configuration is in release mode (for a default Empty C++ Project template).
In the project Property Pages:
C/C++ → General
Debug Information Format: None
SDL Checks: No
C/C++ → Optimization
Enable Intrinsic Functions: No
Omit Frame Pointers: Yes
C/C++ → Language
Enable Run-Time Type Information: No
C/C++ → Code Generation
Enable C++ Exceptions: No
Security Check: Disable Security Check (/GS-)
Linker → Input
Ignore All Default Libraries: Yes
Linker → Advanced
Merge Sections: .rdata=.text (will explain later)
Linker → Debugging
Debuggable Assembly: No (/ASSEMBLYDEBUG:DISABLE) (I don’t think this is necessary?)
Linker → Optimization
Generate Debug Info: No
Link Time Code Generation: Use Link Time Code Generation (/LTCG)
That should be most of them. Try compiling again and you get an error.
1>LINK : error LNK2001: unresolved external symbol mainCRTStartup
What this means is that the entrypoint of the program is not main(), it is actually mainCRTStartup. This is what actually calls the main() function. Since that doesn’t exist anymore, you need to replace your main function with that.
You can compile it again and it will work. You are down to 2 imported functions, 1 imported library, and we are now at a 4kb program. We need to fully erase the imports, though, so what do we do now? WriteConsoleA and GetStdHandle are exported by kernel32, so we need to find their NTDLL equivalent. Secondly, we can’t just import NTDLL, we need to walk the PEB to get the functions manually.
This entails way too much code, but this is necessary to completely remove any and every import.
#include <Windows.h>
#include <winternl.h>
// New definition because winternl doesn't provide enough
struct R_RTL_USER_PROCESS_PARAMETERS
{
ULONG MaximumLength;
ULONG Length;
ULONG Flags;
ULONG DebugFlags;
HANDLE ConsoleHandle;
ULONG ConsoleFlags;
HANDLE StandardInput;
HANDLE StandardOutput;
HANDLE StandardError;
BYTE CurrentDirectory[0x18];
UNICODE_STRING DllPath;
UNICODE_STRING ImagePathName;
UNICODE_STRING CommandLine;
PWSTR Environment;
ULONG StartingX;
ULONG StartingY;
ULONG CountX;
ULONG CountY;
ULONG CountCharsX;
ULONG CountCharsY;
ULONG FillAttribute;
ULONG WindowFlags;
ULONG ShowWindowFlags;
UNICODE_STRING WindowTitle;
UNICODE_STRING DesktopInfo;
UNICODE_STRING ShellInfo;
UNICODE_STRING RuntimeData;
BYTE CurrentDirectories[0x300];
SIZE_T EnvironmentSize;
SIZE_T EnvironmentVersion;
};
constexpr int StrCmp(const char *str1, const char *str2)
{
while (*str1 && *str2)
{
if (*str1 != *str2)
return *str1 - *str2;
str1++;
str2++;
}
return *str1 - *str2;
}
void* ParseEAT(void* dllbase, const char *func)
{
PIMAGE_DOS_HEADER dos = (PIMAGE_DOS_HEADER)dllbase;
PIMAGE_NT_HEADERS nt = (PIMAGE_NT_HEADERS)((BYTE*)dos + dos->e_lfanew);
if (nt->Signature != IMAGE_NT_SIGNATURE)
return nullptr;
DWORD exportrva = nt->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
// No export table
if (exportrva == 0)
return nullptr;
PIMAGE_EXPORT_DIRECTORY exports = (PIMAGE_EXPORT_DIRECTORY)((BYTE*)dos + exportrva);
DWORD* names = (DWORD*)((BYTE*)dos + exports->AddressOfNames);
USHORT* ordinals = (USHORT*)((BYTE*)dos + exports->AddressOfNameOrdinals);
DWORD* functions = (DWORD*)((BYTE*)dos + exports->AddressOfFunctions);
for (DWORD i = 0; i < exports->NumberOfNames; i++)
{
if (!StrCmp((const char*)((BYTE*)dos + names[i]), func))
return (void*)((BYTE*)dos + functions[ordinals[i]]);
}
return nullptr;
}
void* GetPEBFunction(const char *func)
{
PPEB peb = NtCurrentTeb()->ProcessEnvironmentBlock;
PPEB_LDR_DATA ldr = peb->Ldr;
for (PLIST_ENTRY entry = ldr->InMemoryOrderModuleList.Flink; entry != &ldr->InMemoryOrderModuleList; entry = entry->Flink)
{
PLDR_DATA_TABLE_ENTRY mod = CONTAINING_RECORD(entry, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);
if (mod->DllBase == nullptr)
continue;
void* address = ParseEAT(mod->DllBase, func);
if (address != nullptr)
return address;
}
return nullptr;
}
int mainCRTStartup(PPEB peb)
{
struct R_RTL_USER_PROCESS_PARAMETERS *params = (struct R_RTL_USER_PROCESS_PARAMETERS *)peb->ProcessParameters;
HANDLE stdout = params->StandardOutput;
using tNtWriteFile = NTSTATUS(NTAPI *)(HANDLE, HANDLE, PVOID, PVOID, PVOID, PVOID, ULONG, PLARGE_INTEGER, PVOID);
const char ntwritefile[] = {'N', 't', 'W', 'r', 'i', 't', 'e', 'F', 'i', 'l', 'e', '\0'};
tNtWriteFile NtWriteFile = (tNtWriteFile)GetPEBFunction(ntwritefile);
IO_STATUS_BLOCK iosb;
const char hello[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\n'};
NtWriteFile(stdout, NULL, NULL, NULL, &iosb, (PVOID)hello, 13, NULL, NULL);
return 0;
}
To write to console by only using NTDLL, you must get the stdout handle from the PEB’s ProcessParameters structure (which is conveniently provided in the parameters of mainCRTStartup, rather than argc and argv), then you call NtWriteFile to write to the handle. The only annoying part is that you have to parse the PEB to get that function. I also stack-ified those strings to hide them from any string dumping programs.
Note that NTDLL is always loaded in user-mode. So we are guaranteed to have the loader import it for us when the program loads (unless you’re running Windows XP).
Almost there. Only thing that’s left is some debug information, the exception directory, and the resource directory.
If you throw the program into a disassembler, there’s still some debug information left. This is called Profile Guided Optimization (POGO). This can be disabled with an undocumented linker switch called /EMITPOGOPHASEINFO. In VS, go to Linker → Command Line and add that to the Additional Options box.
Now, you no longer have any debug information. All that’s left is the .rsrc and .pdata section.
The /MERGE switch allows sections to be merged into each other. I did this with .rdata, which is almost always generated if you have any sort of strings or global data. The .rsrc (resource) section cannot be merged for who knows what reason, and the .pdata (exception handling) section could be merged, but I have a better plan with it.
There is a slightly unstable switch given the circumstances of dereferencing, but it can be used to further shrink a program. In Linker → Advanced there’s an option for SectionAlignment. By default, sections are aligned to 0x1000 bytes, meaning that each section is guaranteed to start at a memory address that ends in 0xN000. The reason as to why is because computers like to dereference powers of 2. It makes them faster… I think. There are probably also other reasons, but that’s a good enough reason. You are required to set this option to a power of 2, but in my opinion, you shouldn’t need to go smaller than 16. 16 bytes allows xmmwords (16-byte values) the proper alignment to be dereferenced properly. When I set this to 16, the size of my program changed from 2500 bytes to 1500 bytes. Getting smaller and smaller.
The .rsrc section needs to go, clearly. The only thing that’s in it is some random XML. This (apparently) is the manifest file. If you go to Linker → Manifest File and set Generate Manifest to No, there will no longer be a .rsrc section. My executable is now 1024 bytes in size. Almost there.
As far as I know, you can’t do anything else with the compiler options to remove the .pdata, so you’ll have to parse and obliterate the bytes yourself post-compilation.
Part 2 - Post Compile
… will be written in part 2 of this post
Sorry :D. This post got much larger than I thought it would be. I’ll try and get it written for next weekend. Until then…
Go!
-BowTiedCrawfish