Hello all. Busy weekend, so this post comes out today.
Occasionally I throw up a post that uses shellcode, but I haven’t quite explained it in detail. So, that’s what this post is about.
What
Shellcode is referred to as such because it is commonly used as code that opens up a “shell” or a command prompt that an attacker can use remotely. Nowadays, shellcode is used as a term for raw executable code that isn’t built into an executable.
Here’s an example of some x86 shellcode.
55 8B EC 8B 45 08 03 45 0C 5D C3
All this does is add the 2 parameters of the function and then return the sum. Here’s what it looks like when disassembled (I used this online disassembler).
0: 55 push ebp
1: 8b ec mov ebp,esp
3: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
6: 03 45 0c add eax,DWORD PTR [ebp+0xc]
9: 5d pop ebp
a: c3 ret
And here’s what the shellcode actually looks like before it gets compiled.
int test(int a, int b)
{
return a + b;
}
Shellcode is just the assembly (the machine code) of a function. In some cases, it doesn’t even have to be an entire function. For an example, in thread hijack injections, there is no prologue/epilogue because the IP is immediately set to another location during the injection.
Why
Shellcode is very useful for exploitation and injection. It’s difficult to trace as there’s no owning module of it, and it’s usually very small. All you need to do is figure out how to get it into a process and then execute it.
Now, you may be thinking to yourself “wow that is so much easier and much more efficient than writing an entire DLL”. You’re right. But, there’s a few things shellcode can’t do quite as well.
Let’s look at this function.
void MyLoadLibrary(LPCSTR szLib)
{
LoadLibraryA(szLib);
}
Can you write some shellcode that does this? Yes, you could. But it would be a bit complicated.
Before I explain why, let’s look at the disassembly of MyLoadLibrary.
0: 55 push ebp
1: 8b ec mov ebp,esp
3: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
6: 50 push eax
7: ff 15 00 20 40 00 call DWORD PTR ds:0x402000
d: 5d pop ebp
e: c3 ret
When I compiled the function and then disassembled it, it presented the fact that LoadLibraryA is an import held at address 0x402000. This is not a guarantee. LoadLibraryA can be anywhere, and the one of only surefire ways to figure out where it is is to parse the Import Address Table (IAT).
So, in order to create shellcode to call LoadLibraryA, you would have to write this (given you are able to pass the base of the module as a parameter):
0: 55 push ebp
1: 8b ec mov ebp,esp
3: 83 ec 08 sub esp,0x8
6: 8b 4d 08 mov ecx,DWORD PTR [ebp+0x8]
9: 33 d2 xor edx,edx
b: 53 push ebx
c: 56 push esi
d: 57 push edi
e: 8b 41 3c mov eax,DWORD PTR [ecx+0x3c]
11: 89 55 fc mov DWORD PTR [ebp-0x4],edx
14: 8b 84 08 80 00 00 00 mov eax,DWORD PTR [eax+ecx*1+0x80]
1b: 03 c1 add eax,ecx
1d: 89 45 f8 mov DWORD PTR [ebp-0x8],eax
20: 39 50 0c cmp DWORD PTR [eax+0xc],edx
23: 0f 84 0f 01 00 00 je 0x138
29: 0f 1f 80 00 00 00 00 nop DWORD PTR [eax+0x0]
30: 8b 30 mov esi,DWORD PTR [eax]
32: 8b 58 10 mov ebx,DWORD PTR [eax+0x10]
35: 03 f1 add esi,ecx
37: 03 d9 add ebx,ecx
39: 8b 06 mov eax,DWORD PTR [esi]
3b: 85 c0 test eax,eax
3d: 0f 84 dd 00 00 00 je 0x120
43: 0f 88 c3 00 00 00 js 0x10c
49: 8d 78 02 lea edi,[eax+0x2]
4c: 83 c8 ff or eax,0xffffffff
4f: 03 f9 add edi,ecx
51: 8a 0f mov cl,BYTE PTR [edi]
53: 84 c9 test cl,cl
55: 0f 84 ae 00 00 00 je 0x109
5b: 0f 1f 44 00 00 nop DWORD PTR [eax+eax*1+0x0]
60: 0f be c9 movsx ecx,cl
63: 8d 7f 01 lea edi,[edi+0x1]
66: 33 c1 xor eax,ecx
68: 8b c8 mov ecx,eax
6a: d1 e9 shr ecx,1
6c: 8b d1 mov edx,ecx
6e: 81 f2 20 83 b8 ed xor edx,0xedb88320
74: a8 01 test al,0x1
76: 0f 44 d1 cmove edx,ecx
79: 8b c2 mov eax,edx
7b: d1 e8 shr eax,1
7d: 8b c8 mov ecx,eax
7f: 81 f1 20 83 b8 ed xor ecx,0xedb88320
85: f6 c2 01 test dl,0x1
88: 0f 44 c8 cmove ecx,eax
8b: 8b c1 mov eax,ecx
8d: d1 e8 shr eax,1
8f: 8b d0 mov edx,eax
91: 81 f2 20 83 b8 ed xor edx,0xedb88320
97: f6 c1 01 test cl,0x1
9a: 0f 44 d0 cmove edx,eax
9d: 8b c2 mov eax,edx
9f: d1 e8 shr eax,1
a1: 8b c8 mov ecx,eax
a3: 81 f1 20 83 b8 ed xor ecx,0xedb88320
a9: f6 c2 01 test dl,0x1
ac: 0f 44 c8 cmove ecx,eax
af: 8b c1 mov eax,ecx
b1: d1 e8 shr eax,1
b3: 8b d0 mov edx,eax
b5: 81 f2 20 83 b8 ed xor edx,0xedb88320
bb: f6 c1 01 test cl,0x1
be: 0f 44 d0 cmove edx,eax
c1: 8b c2 mov eax,edx
c3: d1 e8 shr eax,1
c5: 8b c8 mov ecx,eax
c7: 81 f1 20 83 b8 ed xor ecx,0xedb88320
cd: f6 c2 01 test dl,0x1
d0: 0f 44 c8 cmove ecx,eax
d3: 8b c1 mov eax,ecx
d5: d1 e8 shr eax,1
d7: 8b d0 mov edx,eax
d9: 81 f2 20 83 b8 ed xor edx,0xedb88320
df: f6 c1 01 test cl,0x1
e2: 0f 44 d0 cmove edx,eax
e5: 8b ca mov ecx,edx
e7: d1 e9 shr ecx,1
e9: 8b c1 mov eax,ecx
eb: 35 20 83 b8 ed xor eax,0xedb88320
f0: f6 c2 01 test dl,0x1
f3: 0f 44 c1 cmove eax,ecx
f6: 8a 0f mov cl,BYTE PTR [edi]
f8: 84 c9 test cl,cl
fa: 0f 85 60 ff ff ff jne 0x60
100: f7 d0 not eax
102: 3d 8d bd c1 3f cmp eax,0x3fc1bd8d
107: 74 38 je 0x141
109: 8b 4d 08 mov ecx,DWORD PTR [ebp+0x8]
10c: 8b 46 04 mov eax,DWORD PTR [esi+0x4]
10f: 83 c6 04 add esi,0x4
112: 83 c3 04 add ebx,0x4
115: 85 c0 test eax,eax
117: 0f 85 26 ff ff ff jne 0x43
11d: 8b 55 fc mov edx,DWORD PTR [ebp-0x4]
120: 8b 5d f8 mov ebx,DWORD PTR [ebp-0x8]
123: 42 inc edx
124: 89 55 fc mov DWORD PTR [ebp-0x4],edx
127: 8d 04 92 lea eax,[edx+edx*4]
12a: 83 7c 83 0c 00 cmp DWORD PTR [ebx+eax*4+0xc],0x0
12f: 8d 04 83 lea eax,[ebx+eax*4]
132: 0f 85 f8 fe ff ff jne 0x30
138: 5f pop edi
139: 5e pop esi
13a: 33 c0 xor eax,eax
13c: 5b pop ebx
13d: 8b e5 mov esp,ebp
13f: 5d pop ebp
140: c3 ret
141: ff 75 0c push DWORD PTR [ebp+0xc]
144: 8b 03 mov eax,DWORD PTR [ebx]
146: ff d0 call eax
148: 5f pop edi
149: 5e pop esi
14a: 5b pop ebx
14b: 8b e5 mov esp,ebp
14d: 5d pop ebp
14e: c3 ret
Oh and you would have to rewrite that should you want to perform the same thing in x64.
Anyways, remember IAT hooks? Well, the IAT is what is being parsed here to find LoadLibraryA. And also, if the program doesn’t import LoadLibraryA? You’ll have to parse it to find GetModuleHandle(A/W) and GetProcAddress so you can get the address of LoadLibraryA manually (most malware does this BTW).
You could also use the PEB and walk the Ldr to find kernel32.dll and thus its exported LoadLibraryA as well.
If you’re lazy (or smart?), you could find a way to pass the function pointer as a parameter to the shellcode.
Anyways, the point is that shellcode has a purpose. It is small, it is versatile, it is difficult to track, and there’s libraries with tons of useful shellcode on the internet (1, 2).
How
“How do you write your shellcode, Crawfish?”
I do it in a lazy way. I have a properly setup Visual Studio project with optimizations turned off, then I write the shellcode function(s), compile them, and then stick the .exe into IDA to extract the bytes of the function I wrote. If I need to be fancier with the shellcode, then I adjust from there with online assembler/disassemblers.
There’s a couple of things to be wary of when writing your own shellcode:
Strings
Hopefully you already know that strings are just pointers to byte arrays in the data section, right? Anyways, if you wanted to use a string in some shellcode, you would have to write some code that looks like this, and then adjust it manually.
void Shellcode_MessageBoxA(decltype(&::MessageBoxA) msgboxptr)
{
const char* msg = "Hello World!";
const char* title = "Hi";
msgboxptr(0, msg, title, 0);
}
Pause. What is decltype?
decltype is a C++ keyword that automagically retrieves the type declaration of a variable at compile time. In this case, it would convert
decltype(&::MessageBoxA)
intoint(WINAPI*)(HWND, LPCSTR, LPCSTR, UINT)
, but that would be a pain in the ass to write and also figure out, so decltype is our friend here.
Let’s assume that msgboxptr is the pointer to the function MessageBoxA.
This is what the shellcode would look like (in x64):
0: 48 89 4c 24 08 mov QWORD PTR [rsp+0x8],rcx
5: 48 83 ec 38 sub rsp,0x38
9: 48 8d 05 00 00 00 00 lea rax,[rip+????????]
10: 48 89 44 24 28 mov QWORD PTR [rsp+0x28],rax
15: 48 8d 05 00 00 00 00 lea rax,[rip+????????]
1c: 48 89 44 24 20 mov QWORD PTR [rsp+0x20],rax
21: 45 33 c9 xor r9d,r9d
24: 4c 8b 44 24 20 mov r8,QWORD PTR [rsp+0x20]
29: 48 8b 54 24 28 mov rdx,QWORD PTR [rsp+0x28]
2e: 33 c9 xor ecx,ecx
30: ff 54 24 40 call QWORD PTR [rsp+0x40]
34: 48 83 c4 38 add rsp,0x38
38: c3 ret
Why x64?
x64 assembly has RIP-relative addressing, which means that you can use reference memory by offsetting RIP by the distance between it and the memory you are trying to access. x86 doesn’t have this feature, so x86 shellcode has to reference the actual offset of the memory location from the base of the module, which might be difficult to find.
TLDR; it was easier to write in x64.
Instructions 9 and 15 are referencing the 2 strings that are input into MessageBoxA. They aren’t filled out yet as we need to add both “Hello World!” and “Hi” into the shellcode so that they can be referenced appropriately.
“Hello World!” expands to 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 00
“Hi” expands to 48 69 00
Now, your shellcode will look like this:
48 89 4C 24 08 48 83 EC 38 48 8D 05 00 00 00 00 48 89 44 24 28 48 8D 05 00 00 00 00 48 89 44 24 20 45 33 C9 4C 8B 44 24 20 48 8B 54 24 28 33 C9 FF 54 24 40 48 83 C4 38 C3 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 00 48 69 00
Almost done. All that is left is to add the strings to the LEA instructions.
This involves some math, but I just use Python.
import pyperclip
helloworld = b"Hello World!\x00"
hi = b"Hi\x00"
shellcode = b"\x48\x89\x4C\x24\x08\x48\x83\xEC\x38\x48\x8D\x05\x00\x00\x00\x00\x48\x89\x44\x24\x28\x48\x8D\x05\x00\x00\x00\x00\x48\x89\x44\x24\x20\x45\x33\xC9\x4C\x8B\x44\x24\x20\x48\x8B\x54\x24\x28\x33\xC9\xFF\x54\x24\x40\x48\x83\xC4\x38\xC3"
learax = b"\x48\x8D\x05\x00\x00\x00\x00"
i = shellcode.find(learax)
if i != -1:
shellcode = shellcode[:i+3] + (len(shellcode) - i - len(learax)).to_bytes(4, "little") + shellcode[i+len(learax):]
shellcode += helloworld
i = shellcode.find(learax)
if i != -1:
shellcode = shellcode[:i+3] + (len(shellcode) - i - len(learax)).to_bytes(4, "little") + shellcode[i+len(learax):]
shellcode += hi
sout = ""
for i in range(len(shellcode)):
sout += f"\\x{shellcode[i]:02X}"
print(sout)
pyperclip.copy(sout)
Voilà, address-relative shellcode!
Function Invocations
Similar to the string problem, functions can be called via a relative address (E8 opcode) and also by absolute address (FF 15). Relative calls would be useful for calling functions within the shellcode, and absolute calls would be useful for calling into the target process. In x64, you may run into an issue that was touched on in the Emplacement Hooks post. If you’re a free subscriber, I’ll just copy and paste the summary here.
“Relative call instructions cannot exceed 2 gigabytes in either direction of the current address. If you have an instruction at address 0x7FF780000000, you cannot perform a relative call to any function < 0x7FF700000000 or > 0x7FF87FFFFFFF. The only way to reach functions beyond that range is to MOV the address into a register and then CALL the register (like in a virtual function!). x86 programs are limited to 2 gigabytes in size (because x86 = x32, which is 32 bits/4 bytes and the highest value of a signed 4 byte integer is 2.147 billion AKA 2 gigabytes). x64 programs can be much larger, so you aren’t guaranteed to have your DLL hook function within 2 gigabytes!”
TLDR; if in x64, you may want to call a function via a register if calling into a program.
That’s all for this post. I wanted to shed some light on the machinations of shellcode for you all.
Have a great weekend!
Go!
-BowTiedCrawfish