Road to Malware Analyst Part 2 - Advanced Disassembly

Hands on!

Jan 13, 2022

Hello again and welcome back. This post is part 2 of the Malware Analysis autist guidebook.

Today we’re going to discuss some more assembly and delve into more advanced techniques and features.

Types

As mentioned in the previous paid post, registers only hold digits and references to other digits. These references are called pointers. Guess why. It’s because they point. Go figure.

Pointers have a specific byte size depending on the architecture the executable was built on. In x86, pointers are 4 bytes. In x64, pointers are 8 bytes.

Let’s classify byte size into some terminology:

BYTE - 1 byte. Think characters and boolean types. Upon compilation these are converted to simple BYTE values.
WORD - 2 bytes. Short ints. Not commonly seen unless used in unions. Some C implementations have regular int sizes set to 16 bits.
DWORD (Double-Word) - 4 bytes. x86 pointers, ints, floats, sometimes longs. If reversing x86 executables, you will see these everywhere. Understanding the difference between floating point and integer storage is crucially important when interpreting decompiled code.
QWORD (Quad-Word) - 8 bytes. x64 pointers, long longs, doubles. Implementation is quite different between x86 and x64. x86 only has 32-bit registers, so to interpret 64-bit values, 2 registers are stuck together like legos.

When decompiling functions, you’ll often see these types as casts. We will discuss disassembler features and shortcuts in a later post.

Strings

Strings are an array of characters. They are referenced in 2 ways in assembly:

Grabbing the pointer to the string itself
Grabbing the character values in the string

To understand the difference, you’ll have to know how strings are held in memory. Strings, in C/C++, are stored as a list of byte values. These byte values hold the ASCII codes which make up the letters in the string. The end of a string is represented by a null byte (‘\0’) also called the null terminator.

Say you want to print the string to the screen with a function like printf(). What the function does is it grabs the starting position of the string (with LEA), and scans and prints each character until it reaches the null terminator.

Where are strings stored?

The “Hello World!” that you’ve put in your first C program sends the “Hello World!”’s bytes to what is called the data section (Windows calls this .rdata). The data section is a segment of a binary executable that holds global data information. This information can range from strings, global variables, floating point values, C++ virtual tables, and Run-Time Type Information (RTTI). Executables are made up of segments, the code you write goes into the .code segment, and global data is stored in .rdata/.rodata/.data.rel.ro. These are just a few segments.

If you think about it, all literal strings are globally stored. The term literal in programming means a known, compile-time value that is unchanged. Dynamic values e.g. input that is read from a user will not be stored in the data section for obvious reasons.

Strings are one of your biggest hints when reversing. They point to debug statements, output messages, and maybe even the key to the ransomware that just hit you.

If/Then

Everyone knows “if” statements. They exist in assembly, but not how one would usually think.

If/then in assembly involves a series of comparisons and interpretation of flags. A common method of comparison is the CMP statement. Let’s explore:

>mov eax, 0x10 >mov ebx, 0x20 >cmp eax, ebx >je loc_1 >push offset my_string ; “eax != ebx” >call printf loc_1: >retn

Let’s break down the code into parts. Surely you already understand the MOV statement. The CMP statement compares registers EAX and EBX. What is it comparing? Well, lots of things. It throws the values of EAX and EBX into the air and fills values of the FLAGS register. The FLAGS register is a set of flags that dictate jump behavior.

In this case, the values of EAX and EBX are compared and then if the values are equal, there is a jump. JE stands for “jump if equal”. There are several jump statements, and they all start with J.

If EAX and EBX are equal, the instruction pointer (EIP) moves the loc_1’s position, and immediately executes RETN.

But, EAX and EBX are not equal, thus there is no jump. The next statement is immediately executed, which prints the string “eax != ebx”.

Note how the string is loaded into the stack. PUSH OFFSET. my_string is stored in the data section, and OFFSET tells the CPU that it is looking for the address which is stored at the offset of my_string from the start of the data section. This is difficult to put into words, so here’s a visual:

0x1000 rdata STARTS … 0x1200 db “eax != ebx”,0 …

Thus, OFFSET loads 0x1000 + 200.

There are also other flags, but we won’t worry about them for now.

The “test” Statement

TEST is an instruction that performs a bitwise AND on 2 operands. It is commonly used to detect if a value is non-zero. Remember the XOR shortcut from the previous post? This is built on the same grounds.

test eax, eax ; Set Zero-Flag (ZF) to 1 if EAX is 0 jz loc_DEADBEEF ; Jump if ZF == 1

Structures

Structs are used everywhere. You need to be aware of what they look like and how they are used. The word struct/class will be used interchangeably, since they are represented the same way in memory.

Let’s take the lego example again. Think of structures like a long brick of legos sat atop one another. Another comparison is the Expo markers you would use as lightsabers during class. In memory, structs are values packed atop one another.

Let’s explore a basic structure (x86):

struct MyStruct { int a; int b; float c; };

What would it look like disassembled? (IDA Pro has a C header file scanning plugin to automatically import structures)

struct MyStruct { DWORD field_0; DWORD field_4; DWORD field_8; }

Once again, the CPU does not care what types exist in the structure nor what they are named. It only cares about the size of the value, called a field. Notice how the “field_” members have a trailing number. This number represents the offset of the field from the base address of the structure.

A compilation process:

int main(int argc, char** argv) { struct MyStruct s; s.a = 4; s.b = 0x10; }

Compiles into:

int main(int argc, char** argv) { struct MyStruct var0; *var0 = 4; *(var0 + 4) = 0x10; }

Which in assembly would look something like:

lea eax, [edi] mov [eax], 4 lea ebx, [edi + 4] mov [ebx], 0x10

What about a more complex structure?

struct MyStruct { void *pNext; bool bActive; int nCount; short iWidth; float flSize; };

Post-compilation:

struct MyStruct { DWORD field_0; BYTE field_4; BYTE nop5; BYTE nop6; BYTE nop7; DWORD field_8; WORD field_C; BYTE nopE; BYTE nopF; DWORD field_10; };

This looks a lot different than you would expect. In most cases, the compiler aligns structures to a 4 byte multiple. This will be 8 bytes in x64 IIRC. The processor reads 4 bytes at a time, so aligning speeds things up quite a bit. There are compiler features (I think with __declspec) where you can ignore this alignment feature.

Fields with sizes smaller than 4 bytes will have padding added to them so that the next field aligns. For example, BYTE followed by another BYTE will have a 2 byte padding if the next field is a DWORD. A BYTE followed by a WORD will have a 1 byte padding after the BYTE. A WORD followed by a BYTE will have a 1 byte padding after the BYTE if the next following field is not a BYTE.

Therefore, the above struct can be optimized.

struct MyStruct { void *pNext; short iWidth; bool bActive; int nCount; float flSize; };

Into:

struct MyStruct { DWORD field_0; WORD field_4; BYTE field_6; BYTE nop7; DWORD field_8; DWORD field_C; };

Float Instructions

When it comes to malware, you won’t often see much float math. Therefore, I’ll only touch on this (also because I hate Windows’ float handling). Fun fact: malware is mostly the same code, just repackaged with different obfuscation techniques.

Floating point variables are handled with what are called Streaming SIMD Extensions (SSE). SSE uses a special set of 8 registers XMM0-7. x64 adds registers 8-15. These XMM registers are 128 bits wide and were designed by Intel to maximize processing power for single precision floating point numbers. Remember when FLOPs were a big thing to brag about?

SSE adds new math instructions strictly for floating point numbers. There are 2 types, packed and unpacked. Unpacked instructions end in “SS” e.g. MOVSS and ADDSS. Packed instructions end in “PS” e.g. MOVPS and SUBPS. You will rarely ever see packed.

There is also SQRTSS and RSQRTSS which do square-roots and inverted square-roots respectively.

Quake III's fast inverted square root function, which revolutionized first-person shooter games, predated these instructions. Intel added them not long after Quake III’s release. The instructions supersede the Quake shortcut and are faster.

In some cases, floating point values will be stored in the data section for fast retrieval. There are no immediate loads in SSE, meaning that it will be faster to retrieve a float’s address from the data section rather than to allocate and perform extra XMM math. Here’s a screenshot of an example.

A Challenge

I challenge you to figure out the passwords to these 4 challenges, c1-c4 (LINK).

Yes, they’re in x64, but you’re smart enough to figure that out right? I would recommend using Ghidra for these so that you can use its built-in decompiler. If you can figure them out with strictly assembly, then you have skillz spelled with an s.

HackTheBox also has reversing challenges, but challenges 2 and beyond will take a little pizzazz to get. They use methods that we haven’t gone over yet!

Shellfish Systems and Security

Discussion about this post