Hello all. We are gonna take yet another swing at function hooking. This one is a particular favorite of mine, because it is so difficult to detect yet is so sweet when you get it to work. Today’s topic is mid-function (midfunc) hooks.
Before You Read
It is recommended to read these related posts before reading this one, so that you don’t get lost during the discussion of the concepts:
What
Midfunc hooks are function hooks that take place in the middle of a function, hence the name. They take a bit more finesse and skill to pull off because you have to dance around instructions that pertain to relative offsets (unless you want to calculate those in your hook). You also have to assure that your hook will fit inside of the function as well since midfunc hooks are roughly 16 bytes in size, give or take (in x64).
You should hopefully know JMP instructions. They set the IP to relative location. midfunc hooks are very similar to a JMP instruction, but there’s one slight issue in x64. JMP instructions can only jump +-2 gigabytes in either direction. So, if you are trying to insert a midfunc hook to a function that is > 2 gigabytes away, you might run into an issue.
Therefore, a 64 bit midfunc hook has 2 styles:
Jump to register
This one is pretty straightforward. You just MOV your address to a register and then jump to it. The main caveat is that maybe the register that you pick is used in the function, and you would probably cause some seriously undefined behavior. If this is a problem, then you can use…
Ret jump
Slightly more impressive. You PUSH RAX to save its current value onto the stack, move your target address to RAX, exchange RAX with [RSP] (which is what you just pushed), then return, which sends your IP to [RSP], and finally pops [RSP] from the stack, restoring everything back to what it was. The only difference is that your IP is somewhere else entirely.
Style 1 is about 13 bytes in length, and style 2 is 16. If you’re really hurting for bytes, you can opt for style 1. It is also the simplest to implement… and understand (lol).
A drawback of midfunc hooks is that you don’t exactly have access to the parameters of the function. This is unfortunate, but you do have access to all of the registers if you want, and can therefore deduce variables and other stuff that you can meddle with from your hook function. Remember that in x64 you don’t have access to the inline assembler, so unless you want to do everything in ASM, you’ll have to be a bit fancy.
Why
We’re going to start right where we left off in Manual Map Injection. If you read it (you did, didn’t you?), you might have seen that I left a hint on the topic of today’s post.
Remember this screenshot?
This adds the score by 10. Let’s try and make it so that we get 1,000,000 points instead of just 10.
Should be easy enough, just set that 0xA to 0xF4240.
But wait, can we do that? The instruction is ADD EAX, 0xA, which is just 3 bytes long (83 C0 0A), only 1 of which holds the signed value to be added. We can’t fit 1,000,000 into just 1 byte. The maximum we can fit there is 127 because it expects a signed number, which is not hardly enough for our liking.
We could just hook the function and then recreate its logic but that is both loud and obnoxious to make.
This is a prime candidate for a midfunc hook.
This is pretty much the main reason why you would want to deploy a midfunc hook. Other reasons include manipulating the parameters of checksum protected functions right before their invocation, manipulating math with obfuscated values, and stack/register reconnaissance which is what debuggers are for.
On an unrelated note, midfunc hooks are super hard to detect. That’s what makes them so fun! There are plenty of heuristics that detect basic trampolines, but the only real way to detect midfunc hooks is to have checksums for function bytes or scan cases of VirtualAlloc if the midhook deploys that.
How
Now it is time to write some code. In reality, you could just JMP into a naked function, but that’s lazy and prevents you from accessing various properties. Here’s what we’ll do.
We will:
Allocate some virtual space in the target process (the trampoline).
Replace the target bytes with JMP code to the allocated space and NOP out any extra bytes.
In the allocated space, set up a call to our hook function, saving any registers that we would want to change.
Call the hook function.
JMP back to the original function.
Think of this like a double jump. We clearly need more space to work with, so we just allocate a trampoline in the process and use it to create more sophisticated shellcode.
Let’s start coding. Same thing as last post, we’re going to manual map inject.
We first need to store the location of the score variable, and then we can worry about the midhook.
PDWORD g_score{};
BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
DisableThreadLibraryCalls(hModule);
CreateThread(0, 0, [](LPVOID param) -> DWORD
{
HMODULE base = GetModuleHandleA(nullptr);
// Acquire g_score
g_score = (PDWORD)((BYTE *)base + 0x6370C);
DWORD oldprotect;
VirtualProtect(g_score, sizeof(PDWORD), PAGE_EXECUTE_READWRITE, &oldprotect);
// Insert midhook
BYTE *target = (BYTE *)base + 0x9103;
PVOID addr = InsertMidHook(target, (PVOID)HookFunction, 2);
return 0;
},
0, 0, 0);
break;
}
return TRUE;
}
By the way, the address of the instructions we want to write over starts at 0x9103. We don’t want to include any of the JMP or CALL instructions because that would be difficult to work around.
Remember that the standard MOV and JMP midhook uses 13 bytes, so that leaves 2 spare, which we can NOP over.
PVOID InsertMidHook(PVOID target, PVOID hookfunc, size_t noppad)
{
// Allocate some extra space to work with
PVOID alloc = VirtualAllocEx(GetCurrentProcess(), nullptr, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
std::vector<BYTE> midhook{
// Mov r10, addr
0x49, 0xBA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
// jmp r10
0x41, 0xFF, 0xE2
};
// Set our new allocation to be moved into r10
*(PVOID*)&midhook.data()[2] = alloc;
// Nop out any extra bytes
for (size_t i = 0; i < noppad; i++)
midhook.push_back(0x90);
// Set protections and then copy over the new instructions
DWORD oldprotect;
VirtualProtectEx(GetCurrentProcess(), target, midhook.size(), PAGE_EXECUTE_READWRITE, &oldprotect);
memcpy(target, midhook.data(), midhook.size());
VirtualProtectEx(GetCurrentProcess(), target, midhook.size(), oldprotect, &oldprotect);
// Setup the shellcode and then copy it to our newly allocated buffer
std::vector<BYTE> shellcode = SetupShellcode(hookfunc, (BYTE*)target + midhook.size());
memcpy(alloc, shellcode.data(), shellcode.size());
return alloc;
}
This allocates some virtual memory to work with. Then, the target address is written with the MOV JMP which moves the freshly allocated buffer to r10, which is then jumped to. After this is written, the shellcode is set up and then written to the newly allocated buffer.
“You did this backwards! You’re supposed to allocate and write to the trampoline before you overwrite the original code because a race condition could execute the original code before you finish writing to the trampoline!!11!”
Shut up.
The shellcode setup is a bit annoying because you need to save all of the registers before you call your hook function as you might overwrite them and cause some chaotic issues with the original function. In this case, you can be a little bit lazy and skip over the XMM registers, but in a real midhook, you would want to save them all. x86 blesses us with the pushad/popad instructions, but alas.
std::vector<BYTE> SetupShellcode(PVOID hookfunc, PVOID jmpback)
{
// This is where the original written-over code would be executed, but since
// we don't need to do that, we can just skip that part and rely on the
// hook function to run the logic for us
std::vector<BYTE> shellcode;
// We need to push and save all of the registers, call the function, and then pop them back
shellcode.push_back(0x50); // push rax
shellcode.push_back(0x51); // push rcx
shellcode.push_back(0x52); // push rdx
shellcode.push_back(0x53); // push rbx
shellcode.push_back(0x56); // push rsi
shellcode.push_back(0x57); // push rdi
shellcode.push_back(0x41); // push r8
shellcode.push_back(0x50);
shellcode.push_back(0x41); // push r9
shellcode.push_back(0x51);
shellcode.push_back(0x41); // push r10
shellcode.push_back(0x52);
shellcode.push_back(0x41); // push r11
shellcode.push_back(0x53);
shellcode.push_back(0x41); // push r12
shellcode.push_back(0x54);
shellcode.push_back(0x41); // push r13
shellcode.push_back(0x55);
shellcode.push_back(0x41); // push r14
shellcode.push_back(0x56);
shellcode.push_back(0x41); // push r15
shellcode.push_back(0x57);
// Save flags too
shellcode.push_back(0x9C); // pushfq
// No float math, so no need to save those registers
// Call hookfunc
shellcode.push_back(0x48); // mov rax, hookfunc
shellcode.push_back(0xB8);
for (BYTE i = 0; i < 8; i++)
shellcode.push_back(0x00);
*(PVOID*)&shellcode.data()[shellcode.size() - sizeof(PVOID)] = hookfunc;
shellcode.push_back(0xFF); // call rax
shellcode.push_back(0xD0);
// Restore flags
shellcode.push_back(0x9D); // popfq
// Restore registers
shellcode.push_back(0x41); // pop r15
shellcode.push_back(0x5F);
shellcode.push_back(0x41); // pop r14
shellcode.push_back(0x5E);
shellcode.push_back(0x41); // pop r13
shellcode.push_back(0x5D);
shellcode.push_back(0x41); // pop r12
shellcode.push_back(0x5C);
shellcode.push_back(0x41); // pop r11
shellcode.push_back(0x5B);
shellcode.push_back(0x41); // pop r10
shellcode.push_back(0x5A);
shellcode.push_back(0x41); // pop r9
shellcode.push_back(0x59);
shellcode.push_back(0x41); // pop r8
shellcode.push_back(0x58);
shellcode.push_back(0x5F); // pop rdi
shellcode.push_back(0x5E); // pop rsi
shellcode.push_back(0x5B); // pop rbx
shellcode.push_back(0x5A); // pop rdx
shellcode.push_back(0x59); // pop rcx
shellcode.push_back(0x58); // pop rax
// Jmp to jmpback
shellcode.push_back(0x49); // mov r10, jmpback
shellcode.push_back(0xBA);
for (BYTE i = 0; i < 8; i++)
shellcode.push_back(0x00);
*(PVOID*)&shellcode.data()[shellcode.size() - sizeof(PVOID)] = jmpback;
shellcode.push_back(0x41); // jmp r10
shellcode.push_back(0xFF);
shellcode.push_back(0xE2);
return shellcode;
}
Last step is to make the hook function. Note that, in this case, it must have 0 arguments.
It simply increments the score by 1,000,000. Seems easy enough, right?
void HookFunction()
{
*g_score += 1000000;
}
Build the DLL, run the game, manual map inject, and…
You get to the top of the scoreboards real quick!
There’s one more thing I’d like to touch on. I figured that I had written enough code for this post. I don’t like writing posts that has a lot of code because it’s difficult to fully explain it in the context of a Substack article. And a post that is mostly code is a lazy way to word-pad the article. But maybe you all like reading code-heavy posts, I don’t know.
Anyways, in this example, this is just a basic midhook that doesn’t make any attempts to modify registers. In a more sophisticated midhook where you need to alter registers and then let the original function run the rest of the code, you would have to modify the shellcode to allocate a block of memory, shove all of the register values into it, send the address of that block of memory into the hook function to be modified, then, after the hook function concludes, shove all of the register values from the block back into the appropriate registers. I’ll go ahead and just slap down some code for what that would look like.
struct Registers
{
DWORD64 rax;
DWORD64 rcx;
DWORD64 rdx;
DWORD64 rbx;
DWORD64 rsi;
DWORD64 rdi;
DWORD64 r8;
DWORD64 r9;
DWORD64 r10;
DWORD64 r11;
DWORD64 r12;
DWORD64 r13;
DWORD64 r14;
DWORD64 r15;
DWORD64 rflags;
DWORD64 xmm0;
DWORD64 xmm1;
DWORD64 xmm2;
DWORD64 xmm3;
DWORD64 xmm4;
DWORD64 xmm5;
DWORD64 xmm6;
DWORD64 xmm7;
DWORD64 xmm8;
DWORD64 xmm9;
DWORD64 xmm10;
DWORD64 xmm11;
DWORD64 xmm12;
DWORD64 xmm13;
DWORD64 xmm14;
DWORD64 xmm15;
};
And then you could either allocate a pointer of that, which you stick into a register (preferably RCX) in the shellcode and then after that just throw down a bunch of mov qword ptr [rcx + 0xN], rNx
all the way through and to the RFLAG and XMM registers. Then, pass that address as a parameter to the hook function, which would be RCX in a standard call. After the hook function concludes, do the opposite, and do a bunch of mov rNx, qword ptr [rcx + 0xN]
. You might need to assign the pointer to a register again? I’m not sure. I don’t feel like writing out 100 lines of assembly and then debugging it.
Anyways, that’s all for this post. Have a great weekend and Happy Easter 🐰🥕🍫!
Go!
-BowTiedCrawfish