Deeper Malware Binary Analysis
With Radare2

We've setup the docker container with disassemblers like radare2 in the previous post.
Separation of Concerns
I'm going to use radare2 to slice out by offset/size from the header. We used file and objdump to determine that this was a universal ("fat") binary with both x86_64 and arm64 CPU architectures.
rabin2 -x malicious.bin
malicious.bin.fat/malicious.bin.x86_64.0 created (342624)
malicious.bin.fat/malicious.bin.arm_64.1 created (362624)
If on MacOS, lipo -thin arm_64 malicious.bin -output arm_64.bin would have sliced specifically for arm_64.
I'm going to focus on arm_64 since Apple Silicon Macs out number Intel-Based ones now.
Set a shell variable:
ARM_64_SLICE=malicious.bin.fat/malicious.bin.arm_64.1
Now, dump the header binary information (basically the metadata) for this slice:
rabin2 -I "$ARM_64_SLICE"
arch arm
baddr 0x100000000
binsz 362624
bintype mach0
bits 64
canary true
injprot false
class MACH064
compiler clang
crypto false
endian little
havecode true
intrp /usr/lib/dyld
laddr 0x0
lang c++
linenum false
lsyms false
machine all
nx false
os macos
pic true
relocs true
sanitize false
static false
stripped false
subsys darwin
va true
We take note of:
pic truewhich means whether the compiled binary can be loaded and executed at any memory address without requiring modifications, by default should be0x100000000. It stands for PIC stands for Position-Independent Code. Ifpic true, the binary is likely using ASLR (Address Space Layout Randomization), making it harder for us to rely on fixed memory locations. Ifpic false, it loads at a hardcoded base address.canary truemeans the binary was compiled with a Stack Canary (a security mitigation designed to prevent stack-based buffer. A "canary" is a known value placed on the stack right before the return address. When a function exits, the program checks if this value has changed (been overwritten). If it has changed, the program assumes a buffer overflow has occurred and instantly aborts execution, preventing an attacker from hijacking the return address. It makes it harder for an attacker to exploit buffer overflows in this binary. Overwriting the return address won't work. Also you'd need a way to leak the canary's secret value first.LC_ENCRYPTION_INFOdoesn't exist so no Apple FairPlay (DRM) encrypted, which I didn't expect anyways if it's malicious.
We can see linked binaries with:
rabin2 -l "$ARM_64_SLICE"
/usr/lib/libc++.1.dylib
/usr/lib/libSystem.B.dylib
The original source is C++. Clang is the default compiler. By default, Clang links every C++ binary against libc++.1.dylib, which is Apple's built-in deployment of the LLVM libc++ standard library. If they had used an older GCC compiler instead, you would see a dependency on GNU's libstdc++.dylib instead which is maybe what we would see with the x86_64 slice.
The Import Table
A program can only call the OS through it's imports (symbols it expects the dynamic linker to resolve). The import list is like a capabilities manifest. You learn what the malware can do before reading a single instruction.
Use -i for imports and grep for anything that is related to encryption, file permissions, and file reads:
rabin2 -i "$ARM_64_SLICE" | grep -iE 'fork|setsid|popen|system|CCCrypt|CCKeyDerivati
on|getpwuid|filesystem|mmap|urandom'
0 0x100043b40 NONE FUNC CCCrypt
1 0x100043b4c NONE FUNC CCKeyDerivationPBKDF
5 0x100043b64 NONE FUNC _ZNKSt3__14__fs10filesystem18directory_iterator13__dereferenceEv
6 0x100043b70 NONE FUNC _ZNKSt3__14__fs10filesystem28recursive_directory_iterator13__dereferenceEv
7 0x100043b7c NONE FUNC _ZNKSt3__14__fs10filesystem28recursive_directory_iterator5depthEv
8 0x100043b88 NONE FUNC _ZNKSt3__14__fs10filesystem4path10__filenameEv
9 0x100043b94 NONE FUNC _ZNKSt3__14__fs10filesystem4path11__extensionEv
10 0x100043ba0 NONE FUNC _ZNKSt3__14__fs10filesystem4path13__parent_pathEv
11 0x100043bac NONE FUNC _ZNKSt3__14__fs10filesystem4path16__root_directoryEv
12 0x100043bb8 NONE FUNC _ZNKSt3__14__fs10filesystem4path18lexically_relativeERKS2_
13 0x100043bc4 NONE FUNC _ZNKSt3__14__fs10filesystem4path9__compareENS_17basic_string_viewIcNS_11char_traitsIcEEEE
34 0x100043c60 NONE FUNC _ZNSt3__112system_errorC1EiRKNS_14error_categoryEPKc
35 ---------- NONE FUNC _ZNSt3__112system_errorD1Ev
52 0x100043ccc NONE FUNC _ZNSt3__115system_categoryEv
55 0x100043ce4 NONE FUNC _ZNSt3__14__fs10filesystem11__copy_fileERKNS1_4pathES4_NS1_12copy_optionsEPNS_10error_codeE
56 0x100043cf0 NONE FUNC _ZNSt3__14__fs10filesystem12__remove_allERKNS1_4pathEPNS_10error_codeE
57 0x100043cfc NONE FUNC _ZNSt3__14__fs10filesystem18__create_directoryERKNS1_4pathEPNS_10error_codeE
58 0x100043d08 NONE FUNC _ZNSt3__14__fs10filesystem18__weakly_canonicalERKNS1_4pathEPNS_10error_codeE
59 0x100043d14 NONE FUNC _ZNSt3__14__fs10filesystem18directory_iterator11__incrementEPNS_10error_codeE
60 0x100043d20 NONE FUNC _ZNSt3__14__fs10filesystem18directory_iteratorC1ERKNS1_4pathEPNS_10error_codeENS1_17directory_optionsE
61 0x100043d2c NONE FUNC _ZNSt3__14__fs10filesystem20__create_directoriesERKNS1_4pathEPNS_10error_codeE
62 0x100043d38 NONE FUNC _ZNSt3__14__fs10filesystem21__temp_directory_pathEPNS_10error_codeE
63 0x100043d44 NONE FUNC _ZNSt3__14__fs10filesystem28recursive_directory_iterator11__incrementEPNS_10error_codeE
64 0x100043d50 NONE FUNC _ZNSt3__14__fs10filesystem28recursive_directory_iteratorC1ERKNS1_4pathENS1_17directory_optionsEPNS_10error_codeE
65 0x100043d5c NONE FUNC _ZNSt3__14__fs10filesystem8__removeERKNS1_4pathEPNS_10error_codeE
66 0x100043d68 NONE FUNC _ZNSt3__14__fs10filesystem8__statusERKNS1_4pathEPNS_10error_codeE
82 ---------- NONE FUNC _ZTINSt3__112system_errorE
86 ---------- NONE FUNC _ZTINSt3__14__fs10filesystem16filesystem_errorE
122 0x100043edc NONE FUNC fork
129 0x100043f30 NONE FUNC getpwuid
136 0x100043f84 NONE FUNC mmap
140 0x100043fb4 NONE FUNC popen
142 0x100043fcc NONE FUNC setsid
144 0x100043fe4 NONE FUNC system
The original output was 148 entries.
| Import | Capability |
|---|---|
fork, setsid |
daemonize / detach into the background |
system, popen, pclose |
run shell commands (this is how it does everything operational) |
CCCrypt, CCKeyDerivationPBKDF |
AppleCommonCrypto — 3DES + PBKDF2 |
getpwuid, getuid |
resolve the current user + home directory |
std::__1::__fs::filesystem::* |
recursive file discovery / copy / staging |
We can grep for any networking calls:
rabin2 -i "$ARM_64_SLICE" | grep -iE 'socket|connect|getaddrinfo|CFNetwork|NSURL'
We find nothing! That hints that there is either no network call out or it's shelling out to curl via popen. Another subprocess is started like subprocess in Python.
You can also see the equivalent in Ghidra via Window -> Symbol Tree -> Imports:
Strings Re-Examined
We took a preliminary look at the printable strings using strings in the previous post, but this time, we take a deeper dive and you can do the same with rabin2, but specifically form the the __text and __cstring data sections.
rabin2 -z "$ARM_64_SLICE" | head
nth paddr vaddr len size section type string
-------------------------------------------------------------------
0 0x000444a0 0x1000444a0 47 48 3.__TEXT.__const ascii NSt3__114basic_ifstreamIcNS_11char_traitsIcEEEE
1 0x000444d0 0x1000444d0 46 47 3.__TEXT.__const ascii NSt3__113basic_filebufIcNS_11char_traitsIcEEEE
2 0x000444ff 0x1000444ff 47 48 3.__TEXT.__const ascii NSt3__114basic_ofstreamIcNS_11char_traitsIcEEEE
3 0x00047cf0 0x100047cf0 6 7 5.__TEXT.__cstring ascii vector
4 0x00047cf7 0x100047cf7 12 13 5.__TEXT.__cstring ascii basic_string
5 0x00047d04 0x100047d04 12 13 5.__TEXT.__cstring ascii /dev/urandom
We can review the output from strings for anything larger than 6 characters:
strings -n 6 "$ARM_64_SLICE" > strings_arm64.txt
Review by eye and also grep for specific things like URLs or osascript(command-line utility on macOS used to execute AppleScript and JavaScript for Automation -JFA code) .
grep -iE 'http|https|osascript' strings_arm64.txt
I would have expected to find something making network connections otherwise what would be the point? They might be hidden or obfuscated.
Main Function
We can find the main function with:
r2 -A $ARM_64_SLICE
WARN: Relocs has not been applied. Please use `-e bin.relocs.apply=true` or `-e bin.cache=true` next time
INFO: Analyze all flags starting with sym. and entry0 (aa)
INFO: Analyze imports (af@@@i)
INFO: Analyze entrypoint (af@ entry0)
INFO: Analyze symbols (af@@@s)
INFO: Analyze all functions arguments/locals (afva@@F)
INFO: Analyze function calls (aac)
INFO: Analyze len bytes of instructions for references (aar)
INFO: Check for objc references (aao)
INFO: Finding and parsing C++ vtables (avrr)
INFO: Analyzing methods (af @@ method.*)
INFO: Finding function preludes (aap)
INFO: Emulate functions to find computed references (aaef)
INFO: Function is too sparse, must be analyzed with recursive
INFO: Function is too sparse, must be analyzed with recursive
INFO: Function is too sparse, must be analyzed with recursive
INFO: Function is too sparse, must be analyzed with recursive
INFO: Function is too sparse, must be analyzed with recursive
INFO: Recovering local variables (afva@@@F)
INFO: Type matching analysis for all functions (aaft)
INFO: Propagate noreturn information (aanr)
INFO: Use -AA or aaaa to perform additional experimental analysis
INFO: Finding xrefs in noncode sections (e anal.in=io.maps.x; aav)
The -A is for analysis (aaa) - find functions, resolve cross-references, name imports.
[0x1000036cc]> afl
This will analyze every function r2 found. We can find just main with:
[0x1000036cc]> afl~main
0x1000036cc 788 136704 main
Then seek to main:
s main
Then print the disassembly: function which is all of main:
[0x1000036cc]> pdf
ERROR: Linear size differs too much from the bbsum, please use pdr instead
[0x1000036cc]> pdr
Do you want to print 5101 lines? (y/N)
In radare2, both commands disassemble functions, but they analyze the underlying binary code differently. Output from Gemini:
pdf (Print Disassembly Function)
Method: Uses linear analysis to disassemble instructions sequentially from the function's start address.
Strengths: Faster execution and a traditional, sequential top-down layout.
Weaknesses: Misses non-linear sections like jump tables or padding hidden inside the function.
Best Use: Standard, cleanly compiled binaries with predictable flow.
pdr (Print Disassembly Recursive)
Method: Uses recursive traversal by following branches, jumps, and call targets to find blocks.
Strengths: Highly accurate for complex, non-contiguous, or obfuscated functions.
Weaknesses: Can get confused by overlapping bytes and takes longer on massive binaries.
Best Use: Stripped binaries, malware, or functions with complex control flow graphs.
It's expectedly big and we can dump the to a separate assembly file with pd. Here we disassemble N instructions which is 60000
pd 60000 > main_pdr.asm
Daemonization
Seek to address 0x1000035b8:
[0x1000036cc]> s 0x1000035b8
[0x1000035b8]> pd 12
| :: 0x1000035b8 888e40f8 ldr x8, [x20, 8]!
| `==< 0x1000035bc a8feffb5 cbnz x8, 0x100003590
| ,==< 0x1000035c0 06000014 b 0x1000035d8
| |: ; CODE XREF from func.10000356c @ 0x1000035a0(x)
| |: 0x1000035c4 c80240f9 ldr x8, [x22]
| |: 0x1000035c8 f40316aa mov x20, x22
| |`=< 0x1000035cc 28feffb5 cbnz x8, 0x100003590
| |,=< 0x1000035d0 02000014 b 0x1000035d8
| || ; CODE XREF from func.10000356c @ 0x100003588(x)
| || 0x1000035d4 f60314aa mov x22, x20
| || ; CODE XREFS from func.10000356c @ 0x1000035b0(x), 0x1000035c0(x), 0x1000035d0(x)
| ``-> 0x1000035d8 760200f9 str x22, [x19]
| 0x1000035dc e00314aa mov x0, x20
| 0x1000035e0 fd7b42a9 ldp x29, x30, [var_20h]
| 0x1000035e4 f44f41a9 ldp x20, x19, [arg_20h]
We see forking processes here with sym.imp.fork. Breaking this down:
0x1000035b8 call sym.imp.fork ; fork()
0x1000035bd test eax, eax ; eax = return value
0x1000035bf js 0x10003fca7 ; if (<0) -> error path
0x1000035c5 mov ecx, eax
0x1000035c9 test ecx, ecx
0x1000035cb jne 0x10003fcac ; if (pid != 0) -> PARENT, branch away to exit
0x1000035d1 call sym.imp.setsid ; CHILD continues: setsid() -> new session leader
0x1000035d6 test eax, eax
0x1000035d8 js 0x10003fca7 ; error path
Basically:
fork()returns the child PID to the parent and 0 to the child from splitting two identical processes.test eax, eax: The system stores the result offork()in theeaxCPU register. This instruction checks if that result is positive, negative, or zero.js 0x10003fca7: "Jump if Sign." If the value ineaxis negative (-1), it meansfork()failed. The program jumps away to an error-handling routine.mov ecx, eax: The program copies thefork()return value fromeaxinto another register,ecx, to preserve it.test ecx, ecx: It checks the return value again.jne 0x10003fcac: "Jump if Not Equal (to zero)."In the Parent process:
fork()returns the child's ID (a number greater than 0). Because it is not zero, the parent process triggers this jump and goes to code that terminates it.In the Child process:
fork()returns exactly0. The child ignores this jump and "falls through" to the next line.
call sym.imp.setsid: Only the child process reaches this line. It callssetsid(), which cuts ties with the terminal window that launched it. This prevents the child from dying if the user closes the command prompt.test eax, eax&js 0x10003fca7: Checks ifsetsid()failed (returned a negative number) and jumps to the error path if it did.
Confirm who calls fork with a cross-reference query with Analyze Cross-References To (axt) which shows that only main at 0x100003710 memory address is where sym.imp.fork is called:
[0x1000035b8]> axt sym.imp.fork
main 0x100003710 [ICOD:r--] bl sym.imp.fork
This is a Daemonization technique. The program clones itself, the original program (parent) closes immediately so the command line or GUI feels responsive again and the clone (child) cuts it's ties to the desktop environment to live in the background. Like an executable running the background and not hold up the terminal with &. Opposite from popen in that the parent process stays alive and spawns a child process specifically to execute a shell command and keeps tabs on the child so they can talk to each other. With fork or daemonization, the child process outlives the parent and get's adopted by the system's root process (PID 1).
TIL: bl sym.imp.fork the bl stands for Branch with Link which is ARM's equivalent of the x86's call.
Configuration
Since we couldn't find any indication of URLs or other info, they have to be stored somewhere like a config file or constants. Some googling shows common convention for build_info, worth a shot to look for it:
[0x1000035b8]> is~build_info
78 0x0004131c 0x10004131c LOCAL FUNC 0 __ZN12build_info_tC2ER23serialized_build_info_t build_info_t::build_info_t(serialized_build_info_t&)
79 0x000439bc 0x1000439bc LOCAL FUNC 0 __ZN12build_info_tD2Ev build_info_t::~build_info_t()
116 0x0004c300 0x10004c300 LOCAL FUNC 0 _g_serialized_build_info
We see the _g_serialized_build_info which sounds promising. The build info is stored in some packed/serialized format (byte array, JSON, Protocol Buffers). The output shows the constructor and destructor memory addresses. On a side note, review how the virtual address is determined (Base Address (typically 0x100000000) + Section Offset + Symbol Offset). Above 0x0004131c is the virtual address (where the code resides in RAM).
I tried to dump the data block:
ps @ sym._g_serialized_build_info
But got nothing returned. If it's not plaintext, it could be a blob and can print it in a 128-bit hex dump:
[0x1000035b8]> ps 128 @ sym._g_serialized_build_info
\x8b\x84\xe2
V\x8f\xdc\xc3m\x01\x00\x00\x00\x00\x00\x00\xa9B\xb8\xc1\xff\xe4\x0f\x17\x8eq\xc5\xa3\x9cVzK\xa1J\x91\xd4>^\x0b\xc5b\xdfW\x0b\xcf\x1e\xa5:!#\x86\x179cO\xc7\xe0\x9f\x94\xe6\x06>6x\xb3\xf8"8\xda\xc4M\xca\xf9j\x8e:\x05\x0d\x14\xf0\xed5\xde\xb6\x93\x93i9\xed\x1f\xe4\x85\xd3\x1eG}\x96u\xef\xbe\x12o+\xf3$\x99\x16J\x92\x15\xba"xv\xc0Ir'\x1b\x9c\xbf\xed\xf0\x92('<x
Interesting.
Find all the cross-references:
[0x1000035b8]> axt sym._g_serialized_build_info
method.build_info_t.build_info_t_serialized_build_info_t_ 0x1000413f0 [ICOD:r--] add x22, x22, sym._g_serialized_build_info
method.build_info_t.build_info_t_serialized_build_info_t_ 0x1000413fc [DATA:r--] ldr x9, sym._g_serialized_build_info
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041418 [ICOD:r--] add x12, x22, x10
method.build_info_t.build_info_t_serialized_build_info_t_ 0x1000415e4 [ICOD:r--] add x8, x22, x8
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041628 [ICOD:r--] add x9, x22, x9
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041724 [ICOD:r--] add x8, x22, x8
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041768 [ICOD:r--] add x9, x22, x9
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041864 [ICOD:r--] add x8, x22, x8
method.build_info_t.build_info_t_serialized_build_info_t_ 0x1000418a8 [ICOD:r--] add x9, x22, x9
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041a1c [ICOD:r--] add x9, x22, x9
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041ad8 [ICOD:r--] add x9, x22, x9
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041b90 [ICOD:r--] add x9, x22, x9
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041c48 [ICOD:r--] add x9, x22, x9
method.build_info_t.build_info_t_serialized_build_info_t_ 0x100041d00 [ICOD:r--] add x11, x22, x8
This is a lot of calls to build_info_t. The first line:
0x1000413f0 add x22, x22, sym._g_serialized_build_info
The compiler designates register x22 as the base pointer. Then x22 now points to the first byte of 0x8b of the data blob then this function will use x22 as it's starting anchor.
Then we see the code execute a load instruction:
0x1000413fc ldr x9, sym._g_serialized_build_info
The ldr stands for Load Register. It copies a chunk of data (usually 8 bytes) out of the memory blob and places it into register x9.
This is the the looping without the radare2 flags:
0x1000415e4 add x8, x22, x8
0x100041628 add x9, x22, x9
I've pretty much reached my skill limits with reverse engineering from PicoCTF, Hack-The-Box Academy and CrackMes. Let's use current available resources/tools to dive deeper...



