Midfunc Hooks

Where did you come from, where did you go

Apr 08, 2023

Hello all. We are gonna take yet another swing at function hooking. This one is a particular favorite of mine, because it is so difficult to detect yet is so sweet when you get it to work. Today’s topic is mid-function (midfunc) hooks.

Before You Read

It is recommended to read these related posts before reading this one, so that you don’t get lost during the discussion of the concepts:

What

Midfunc hooks are function hooks that take place in the middle of a function, hence the name. They take a bit more finesse and skill to pull off because you have to dance around instructions that pertain to relative offsets (unless you want to calculate those in your hook). You also have to assure that your hook will fit inside of the function as well since midfunc hooks are roughly 16 bytes in size, give or take (in x64).

You should hopefully know JMP instructions. They set the IP to relative location. midfunc hooks are very similar to a JMP instruction, but there’s one slight issue in x64. JMP instructions can only jump +-2 gigabytes in either direction. So, if you are trying to insert a midfunc hook to a function that is > 2 gigabytes away, you might run into an issue.

Therefore, a 64 bit midfunc hook has 2 styles:

Jump to register
This one is pretty straightforward. You just MOV your address to a register and then jump to it. The main caveat is that maybe the register that you pick is used in the function, and you would probably cause some seriously undefined behavior. If this is a problem, then you can use…
Ret jump
Slightly more impressive. You PUSH RAX to save its current value onto the stack, move your target address to RAX, exchange RAX with [RSP] (which is what you just pushed), then return, which sends your IP to [RSP], and finally pops [RSP] from the stack, restoring everything back to what it was. The only difference is that your IP is somewhere else entirely.

Style 1 is about 13 bytes in length, and style 2 is 16. If you’re really hurting for bytes, you can opt for style 1. It is also the simplest to implement… and understand (lol).

A drawback of midfunc hooks is that you don’t exactly have access to the parameters of the function. This is unfortunate, but you do have access to all of the registers if you want, and can therefore deduce variables and other stuff that you can meddle with from your hook function. Remember that in x64 you don’t have access to the inline assembler, so unless you want to do everything in ASM, you’ll have to be a bit fancy.

Why

We’re going to start right where we left off in Manual Map Injection. If you read it (you did, didn’t you?), you might have seen that I left a hint on the topic of today’s post.

Remember this screenshot?

This adds the score by 10. Let’s try and make it so that we get 1,000,000 points instead of just 10.

Should be easy enough, just set that 0xA to 0xF4240.

But wait, can we do that? The instruction is ADD EAX, 0xA, which is just 3 bytes long (83 C0 0A), only 1 of which holds the signed value to be added. We can’t fit 1,000,000 into just 1 byte. The maximum we can fit there is 127 because it expects a signed number, which is not hardly enough for our liking.

We could just hook the function and then recreate its logic but that is both loud and obnoxious to make.

This is a prime candidate for a midfunc hook.

This is pretty much the main reason why you would want to deploy a midfunc hook. Other reasons include manipulating the parameters of checksum protected functions right before their invocation, manipulating math with obfuscated values, and stack/register reconnaissance which is what debuggers are for.

On an unrelated note, midfunc hooks are super hard to detect. That’s what makes them so fun! There are plenty of heuristics that detect basic trampolines, but the only real way to detect midfunc hooks is to have checksums for function bytes or scan cases of VirtualAlloc if the midhook deploys that.

How

Now it is time to write some code. In reality, you could just JMP into a naked function, but that’s lazy and prevents you from accessing various properties. Here’s what we’ll do.

We will:

Allocate some virtual space in the target process (the trampoline).
Replace the target bytes with JMP code to the allocated space and NOP out any extra bytes.
In the allocated space, set up a call to our hook function, saving any registers that we would want to change.
Call the hook function.
JMP back to the original function.

Think of this like a double jump. We clearly need more space to work with, so we just allocate a trampoline in the process and use it to create more sophisticated shellcode.

Let’s start coding. Same thing as last post, we’re going to manual map inject.

We first need to store the location of the score variable, and then we can worry about the midhook.

PDWORD g_score{};

BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
	switch (ul_reason_for_call)
	{
	case DLL_PROCESS_ATTACH:
		DisableThreadLibraryCalls(hModule);

		CreateThread(0, 0, [](LPVOID param) -> DWORD
		{
			HMODULE base = GetModuleHandleA(nullptr);

			// Acquire g_score
			g_score = (PDWORD)((BYTE *)base + 0x6370C);
			DWORD oldprotect;
			VirtualProtect(g_score, sizeof(PDWORD), PAGE_EXECUTE_READWRITE, &oldprotect);

			// Insert midhook
			BYTE *target = (BYTE *)base + 0x9103;
			PVOID addr = InsertMidHook(target, (PVOID)HookFunction, 2);
			return 0;
		},
		0, 0, 0);
		break;
	}
	return TRUE;
}

By the way, the address of the instructions we want to write over starts at 0x9103. We don’t want to include any of the JMP or CALL instructions because that would be difficult to work around.

Remember that the standard MOV and JMP midhook uses 13 bytes, so that leaves 2 spare, which we can NOP over.

PVOID InsertMidHook(PVOID target, PVOID hookfunc, size_t noppad)
{
	// Allocate some extra space to work with
	PVOID alloc = VirtualAllocEx(GetCurrentProcess(), nullptr, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	std::vector<BYTE> midhook{
		// Mov r10, addr
		0x49, 0xBA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
		// jmp r10
		0x41, 0xFF, 0xE2
	};

	// Set our new allocation to be moved into r10
	*(PVOID*)&midhook.data()[2] = alloc;

	// Nop out any extra bytes
	for (size_t i = 0; i < noppad; i++)
		midhook.push_back(0x90);

	// Set protections and then copy over the new instructions
	DWORD oldprotect;
	VirtualProtectEx(GetCurrentProcess(), target, midhook.size(), PAGE_EXECUTE_READWRITE, &oldprotect);
	memcpy(target, midhook.data(), midhook.size());
	VirtualProtectEx(GetCurrentProcess(), target, midhook.size(), oldprotect, &oldprotect);

	// Setup the shellcode and then copy it to our newly allocated buffer
	std::vector<BYTE> shellcode = SetupShellcode(hookfunc, (BYTE*)target + midhook.size());
	memcpy(alloc, shellcode.data(), shellcode.size());

	return alloc;
}

This allocates some virtual memory to work with. Then, the target address is written with the MOV JMP which moves the freshly allocated buffer to r10, which is then jumped to. After this is written, the shellcode is set up and then written to the newly allocated buffer.

“You did this backwards! You’re supposed to allocate and write to the trampoline before you overwrite the original code because a race condition could execute the original code before you finish writing to the trampoline!!11!”

Shut up.

The shellcode setup is a bit annoying because you need to save all of the registers before you call your hook function as you might overwrite them and cause some chaotic issues with the original function. In this case, you can be a little bit lazy and skip over the XMM registers, but in a real midhook, you would want to save them all. x86 blesses us with the pushad/popad instructions, but alas.

std::vector<BYTE> SetupShellcode(PVOID hookfunc, PVOID jmpback)
{
	// This is where the original written-over code would be executed, but since 
	// we don't need to do that, we can just skip that part and rely on the
	// hook function to run the logic for us

	std::vector<BYTE> shellcode;
	// We need to push and save all of the registers, call the function, and then pop them back
	shellcode.push_back(0x50); // push rax
	shellcode.push_back(0x51); // push rcx
	shellcode.push_back(0x52); // push rdx
	shellcode.push_back(0x53); // push rbx
	shellcode.push_back(0x56); // push rsi
	shellcode.push_back(0x57); // push rdi
	shellcode.push_back(0x41); // push r8
	shellcode.push_back(0x50);
	shellcode.push_back(0x41); // push r9
	shellcode.push_back(0x51);
	shellcode.push_back(0x41); // push r10
	shellcode.push_back(0x52);
	shellcode.push_back(0x41); // push r11
	shellcode.push_back(0x53);
	shellcode.push_back(0x41); // push r12
	shellcode.push_back(0x54);
	shellcode.push_back(0x41); // push r13
	shellcode.push_back(0x55);
	shellcode.push_back(0x41); // push r14
	shellcode.push_back(0x56);
	shellcode.push_back(0x41); // push r15
	shellcode.push_back(0x57);
	// Save flags too
	shellcode.push_back(0x9C); // pushfq
	// No float math, so no need to save those registers
	
	// Call hookfunc
	shellcode.push_back(0x48); // mov rax, hookfunc
	shellcode.push_back(0xB8);
	for (BYTE i = 0; i < 8; i++)
		shellcode.push_back(0x00);
	
	*(PVOID*)&shellcode.data()[shellcode.size() - sizeof(PVOID)] = hookfunc;
	shellcode.push_back(0xFF); // call rax
	shellcode.push_back(0xD0);

	// Restore flags
	shellcode.push_back(0x9D); // popfq
	// Restore registers
	shellcode.push_back(0x41); // pop r15
	shellcode.push_back(0x5F);
	shellcode.push_back(0x41); // pop r14
	shellcode.push_back(0x5E);
	shellcode.push_back(0x41); // pop r13
	shellcode.push_back(0x5D);
	shellcode.push_back(0x41); // pop r12
	shellcode.push_back(0x5C);
	shellcode.push_back(0x41); // pop r11
	shellcode.push_back(0x5B);
	shellcode.push_back(0x41); // pop r10
	shellcode.push_back(0x5A);
	shellcode.push_back(0x41); // pop r9
	shellcode.push_back(0x59);
	shellcode.push_back(0x41); // pop r8
	shellcode.push_back(0x58);
	shellcode.push_back(0x5F); // pop rdi
	shellcode.push_back(0x5E); // pop rsi
	shellcode.push_back(0x5B); // pop rbx
	shellcode.push_back(0x5A); // pop rdx
	shellcode.push_back(0x59); // pop rcx
	shellcode.push_back(0x58); // pop rax
	
	// Jmp to jmpback
	shellcode.push_back(0x49); // mov r10, jmpback
	shellcode.push_back(0xBA);
	for (BYTE i = 0; i < 8; i++)
		shellcode.push_back(0x00);
	
	*(PVOID*)&shellcode.data()[shellcode.size() - sizeof(PVOID)] = jmpback;
	shellcode.push_back(0x41); // jmp r10
	shellcode.push_back(0xFF);
	shellcode.push_back(0xE2);

	return shellcode;
}

Last step is to make the hook function. Note that, in this case, it must have 0 arguments.

It simply increments the score by 1,000,000. Seems easy enough, right?

void HookFunction()
{
	*g_score += 1000000;
}

Build the DLL, run the game, manual map inject, and…

You get to the top of the scoreboards real quick!

There’s one more thing I’d like to touch on. I figured that I had written enough code for this post. I don’t like writing posts that has a lot of code because it’s difficult to fully explain it in the context of a Substack article. And a post that is mostly code is a lazy way to word-pad the article. But maybe you all like reading code-heavy posts, I don’t know.

Anyways, in this example, this is just a basic midhook that doesn’t make any attempts to modify registers. In a more sophisticated midhook where you need to alter registers and then let the original function run the rest of the code, you would have to modify the shellcode to allocate a block of memory, shove all of the register values into it, send the address of that block of memory into the hook function to be modified, then, after the hook function concludes, shove all of the register values from the block back into the appropriate registers. I’ll go ahead and just slap down some code for what that would look like.

struct Registers
{
	DWORD64 rax;
	DWORD64 rcx;
	DWORD64 rdx;
	DWORD64 rbx;
	DWORD64 rsi;
	DWORD64 rdi;
	DWORD64 r8;
	DWORD64 r9;
	DWORD64 r10;
	DWORD64 r11;
	DWORD64 r12;
	DWORD64 r13;
	DWORD64 r14;
	DWORD64 r15;
	DWORD64 rflags;
	DWORD64 xmm0;
	DWORD64 xmm1;
	DWORD64 xmm2;
	DWORD64 xmm3;
	DWORD64 xmm4;
	DWORD64 xmm5;
	DWORD64 xmm6;
	DWORD64 xmm7;
	DWORD64 xmm8;
	DWORD64 xmm9;
	DWORD64 xmm10;
	DWORD64 xmm11;
	DWORD64 xmm12;
	DWORD64 xmm13;
	DWORD64 xmm14;
	DWORD64 xmm15;
};

And then you could either allocate a pointer of that, which you stick into a register (preferably RCX) in the shellcode and then after that just throw down a bunch of mov qword ptr [rcx + 0xN], rNx all the way through and to the RFLAG and XMM registers. Then, pass that address as a parameter to the hook function, which would be RCX in a standard call. After the hook function concludes, do the opposite, and do a bunch of mov rNx, qword ptr [rcx + 0xN]. You might need to assign the pointer to a register again? I’m not sure. I don’t feel like writing out 100 lines of assembly and then debugging it.

Anyways, that’s all for this post. Have a great weekend and Happy Easter 🐰🥕🍫!

Go!

-BowTiedCrawfish

Shellfish Systems and Security

Discussion about this post