Hello all!
Welcome back to The Malware Analyst’s Autist Guidebook. Today’s topic will discuss triaging techniques; namely what you would immediately do if you were handed a malware sample and asked to figure out what it does.
There’s no be-all end-all method of triage, but I will discuss the method that I use should I be presented with malware in need of quick analysis.
What should you do? Well, let’s find out.
Obviously, you should already have your sandbox set up. You should have a snapshotted Windows VM with every tool you could ever need under the sun. Basically, you’re ready to take on some malware.
If you’re in need of a toolset, consult the Intro to Malware Analysis post.
The purpose of malware triaging is to establish possibilities. I’ll call these possibilities theories. Perhaps the malware is ransomware, or maybe its a RAT. You are attempting to categorize the malware so that you know which tools to use during later analysis. Quick inspection/triage of a program can procure theories which you can orient your analysis process around in an attempt to prove your theory right.
I’ve already discussed static analysis, but there is a type of subset static analysis that precedes it. This is often called static property analysis or just property analysis.
This is the first thing you should do. Property analysis gives you a glimpse at what exactly the malware is doing, providing you theories in which you can explore upon.
Every file contains metadata. Let’s take a random camera image for example. JPGs use what is called Exchangeable Image File Format (Exif). This format is the standard in which image formats operate. It provides what is called metadata, which is literally defined as data that provides information about other data. File metadata is often contained in the front bytes of a file, called the header. For images, you can see the type of camera that took the picture and the timestamp in which the image was taken.
This information can be viewed through a specialized metadata viewer or by inspecting the file with a hex editor.
How does the OS know what a file is and what it does? How can it differentiate between .jpg images and executables?
OS’s detect file types based on file signatures also called magic numbers. These are the first few bytes of a file which dictate what kind of file it is. Consider this website for some examples.
Windows gives a few shits about file extensions (.exe, .dll, .txt, etc) and will try to run the program based on that. Linux, however, does not care, and you can execute a .jpg if you wanted.
Executable metadata presentation is no different than that of a JPEG. Windows executables are in Portable Executable (PE) format. There are programs dedicated to inspecting PE headers. I personally use PeStudio but there are others such as PEExplorer.
Let’s drop last post’s malware sample (the keylogger) into PeStudio to inspect it.
A lot will pop up. There are tabs which provide metadata, function and library imports, and strings. PeStudio even tosses the file hash into VirusTotal for some quick OSI. File type, bit-size, compiler version, type and timestamp, and more can be visible on the main page. Program metadata can be found under the header tabs.
So, what are you looking for?
You’re looking for hints, clues, anything that could point you in a behavioral direction in preparation for your behavioral analysis.
Beyond OSINT/CYBINT, there are 3 main places to find clues in property analysis. For example’s sake, let’s say we have no idea what this malware is doing.
As mentioned in previous posts, Windows malware uses Windows library functions to execute system calls. These functions are provided by libraries called Dynamically Linked Libraries (DLLs). If the malware is using a certain library, we can induce behavior based on such.
Under the libraries tab, we see 3 imported libraries. PeStudio knows that ws2_32.dll utilizes socket connections and flags it for us. Therefore, we know that the malware sample must be trying to connect to an external host for malicious purposes. Remember, we should be trying to categorize the malware, and ws2_32 points to something external. This means that information can be leaked off of the machine, or the malware can be expected to receive commands via a C&C server or something similar.
What we don’t see are libraries that point to something like ransomware (crypt32.dll is often used but not always), registry manipulation (advapi32.dll), or HTTP networking (netapi32.dll).
We’ve ruled off multiple malware types just by seeing a single library.
After looking at library imports, we know that there is some sort of connection. What we need to theorize next is what is being sent to/from the remote server. Odds are, this is something that is found through static/dynamic/behavioral analysis, but let’s see if there’s anything else that property analysis can tell us before.
After imports, we should check the functions list. Just because a library is imported does not mean all of its functions will be used.
Once again, PeStudio highlights suspicious functions and their libraries. We already knew about the socket connections and have now just about confirmed it. We can even rule out more about the socket connections. There is no accept
or listen
function used, so we know that this malware is not listening to external commands. It is only sending information in one direction to a server (with connect
and send
).
Another suspicious function is GetAsyncKeyState
. This is the biggest red flag imaginable. No program ever uses function unless it is logging your keystrokes (duh). We can induce that this program is a keylogger that sends your keystrokes to an external server, and we haven’t even disassembled it yet. This can be confirmed with static/dynamic/behavioral analysis.
Do not ever “just assume” what a malware sample does. Always confirm and document your results. Yes, malware authors can import and use library calls in functions that are never called to thwart analysts. It will only work on shitty analysts though, so don’t be one. Always, always, always confirm results. There are no assumptions in malware analysis. There are only theories that are proved through iterative analysis.
The last step you would take is to check strings for any more hints. Functions and their libraries also show up under strings, so PeStudio will once again highlight any blacklisted ones for us.
Strings are really just a big haystack. There’s so much gobbledegook in there that you may not find anything of relevance, especially if the program is packed. Strings are also often obfuscated (as we learned in the previous post). Mandiant’s FLARE Floss can help find those pesky strings.
From what I’ve discovered, most of the time you’ll just see registry locations which are touched by the program. Registry manipulation and persistence will be discussed in a later post.
After collecting property and metadata information, document it. Yes. Document everything. Your boss wants to know the details of your analysis process, so be very detailed. If you have practice doing writeups, you should understand this already.
You should be writing down information as you go along your analysis process. Think of it like commenting your code. Someone (or even you) will go back and see how you analyzed a certain malware sample. Sometimes, the analyses process can take a very long time, so being able to go back and reference previously analyzed material is quite important.
You should decide your next step based on your theories. For our example, I would personally go through static analysis to see if there’s any behavior I can discover without having to launch a bunch of behavioral analysis tools. Static analysis is my strong suit, so it’s something that I lean on a bit more during my initial analysis.
After your property analysis, you should have theories. The next post will discuss behavioral analysis in which you will inspect external changes such as networking attempts, registry manipulation, and process creation as you run/detonate malware. I’ll try to write something cool over the weekend that we can analyze.
Malware analysis is a complex puzzle based on theorized possibilities. It’s quite literally an artform. Everyone has their own unique styles of analysis. I, for one, am quite good at interpreting disassembly and decompiled code, so I probably lean on static analysis more than the average person.
You emulate your theories through behavioral analysis to prove yourself right. If your theory is wrong, you simply mark it off and reiterate. If you run out of theories, then you’re missing something from your property analysis and will have to go back to step 1.
Your worst case scenario (from what I’ve learned) is finding a program that is packed and obfuscated to hell and back multiple times to the point where you have to continuously dump and re-analyze the dumped program to discover its behavior. We shall discuss disgustingly written malware like such in the future.
Malware analysis is tedious, and not for the faint of heart. It will suck. You will hate it. But by God the dopamine hits are awesome when you figure out the puzzle, at least for me. Not to mention your pay grade.
YOUR HOMEWORK:
That’s right, I’m assigning you some homework.
Read this. Bonus points if you watch the videos too.
And this.
Lenny Zeltser is a malware analyst and the CISO of Axonius, a cybersecurity asset management platform. He is also the instructor for the SANS GREM certification course. He knows his shit.
Thanks for reading, as always.