Malware detection with YARA ru

What is YARA?

How do we detect if a system is infected? Typically, cybersecurity defenders use YARA rules for malware detection. YARA is the engine that runs these rules. The modern engine is YARA-X which is re-written in Rust by the original creator, Victor M. Alvarez. We can detect the MD5, SHA1, and SHA-256 hashes of the malicious binary or any binary in question. On MacOS, the XProtect and XProtect Remediator run these YARA rules to scan for malware. The rules should be under:

 /Library/Apple/System/Library/CoreServices/XProtect.bundle/Contents/Resources/XProtect.yara

We can't update these rules or run XProtect ourselves, but we can install Yara-X and run custom rules.

brew install yara-x

The rules are kind of like wanted posters. They contain three sections:

meta - The metadata describing the rule. It's not used in scanning, but helps whoever maintains the rule. The author so you can contact that person for clarification. The date is useful for checking how stale the rule is.
strings - The strings to look for.
condition - The malicious text or HEX signature to look for.

Here is a gist of the YARA rule generated with Claude's help.

https://gist.github.com/Wind010/5c67679ea23f9f95cb2fe82d2b79960d

Import

'The import "macho" loads the Mach-O extension module, which allows us to write specific structure rules targeting MacOS and iOS binaries. On windows it would be the pe module. By itself, YARA/YARA-X searches for patterns inside a file, but it doesn't know what a MacOS or Windows binary looks like (symbols, structure, headers, etc).

Rule Name

The rule name matters much like a descriptive test name or method/function name. Just like when you're wondering what this method actually does or when a test fails you want to know some details about it by just looking at the test name i.e. (test_name__scenario_under_test__expected_result). In a corporate environment, some poor analyst get's alerted at 2AM, the first thing they read should be something descriptive before cracking open that laptop. Here, it instantly tells them GITHUBCODE is running on MacOS and is stealing things. A name like CORP_123_XYZ helps nobody.

Metadata

The meta details are important like method/function documentation. A note to your future self or the next person/AI that reads the rule file.

Strings

This section contains the actual clues to track down the binary in question. Like distinguishing features of the suspect (scar, tattoo, limp, lisp, lazy-eye, you get the point). There are two indicators, the byte signatures and text strings.

The byte signatures are like tats. You can't easily remove them.

$seed_x64 = { 41 a3 c5 91 55 b3 1c 47 }
$seed_arm = { 8b 84 e2 0a 56 8f dc c3 }
$magic    = { fa de 07 11 }

Anything inside the curly braces are raw bytes in hexadecimal. These indicators are key. Pun intended. They're the malware's own secret keys. The configuration got scrambled with XOR encoding with a secret number/bytes. The seed is that secret number, the literal seed that's needed for encryption and decryption (symmetric encryption). It's used as a keystream generator. Each byte is XORed with the current key state, then the 64-bit key is rotated and right by one bit (ror64) before the next byte.

keystream_byte[i] = (key >> (8 * (i mod 8))) & 0xFF
key = ror64(key, 1)        # key evolves for every byte

The $magic is special marker it stamps on the encrypted data. The XOR operation can be used for encoding or encryption. It's used as encryption in this case with the 8-byte key which is the seed. Each of those seeds encrypt and decrypt their own respective configuration.

This variant of the malware can't function without these values. The file hash could change and we'd still be able to detect these since it's critical to the malware's underlying operation. Some other variant my have different keys though.

The other indicator is text strings. The clothing/accessories that are recognizable, but changeable... like a black leather jacket (I'm looking at you Jensen).

$cfg    = "_g_serialized_build_info" ascii
$persist= "com.apple.dt.testmanagerd.runner" ascii
$plist  = "<!DOCTYPE plist PUBLIC" ascii
$cc     = "CCKeyDerivationPBKDF" ascii

The values are readable text which are just plain ascii as opposed to other encodings like unicode/utf-8. These are human readable fingerprints that we determined using strings, objdump and radare2.

_g_serialized_build_info - The internal name of its hidden config blob. Weirdly specific. Benign software basically never has this.
com.apple.dt.testmanagerd.runner - This is the malware impersonating a real Apple system service to hide its persistence. Sneaky, and a great tell. Other variants have other system services.
<!DOCTYPE plist PUBLIC - The header of the MacOS config file, evidence it builds one to survive reboots.
CCKeyDerivationPBKDF - The name of the Apple encryption function it calls. Legit binaries can call this though too.

These text indicators are easily changed by the malware developer and we see this with old version and variants. They're used more as supporting evidence, but not the sole reason to convict the suspect.

Condition

The condition section to keep with the same wanted poster/legal system analogy would be the judge. This is where the YARA engine decides whether or not both the indicators have been tripped.

This section checks if it's even a MacOS binary:

( uint32(0) == 0xfeedfacf or uint32(0) == 0xfeedface or
uint32be(0) == 0xcafebabe or uint32be(0) == 0xbebafeca )

A thin 64-bit Mach-0 starts with bytes cf fa ed fe as little-endian uint32 or 0xfeedfacf. Ours is a 'fat'/universal binary (contains x86_64 and arm_64 architectures). It starts with ca fe ba be, which is big-endian which would be 0xcafebabe. We check both byte-order directions, to be sure. The uint32(0) means, read the 4-bytes at the very start of the file. Then compare against the hex/magic numbers for each possible version.

0xfeedfacf / 0xfeedface1 - Standard Mac program (note it spells "feed face," kek).
0xcafebabe / 0xbebafeca - A "universal" Mac program that bundles Intel and Apple-Silicon versions together. This was what we analyzed.

This whole section is like a if null check and bail out early. If not our suspect criteria, GTFO. The next section checks to see if this is our culprit.

( 2 of (\(seed, \)xkey, $magic)
    or ( 1 of (\(seed, \)xkey, \(magic) and 2 of (\)cfg, \(persist, \)plist, $cc) ) )

This is a conditional that handles 2 cases.

The 2 of ($seed, $xkey, $magic) - If I find two of the three byte-signatures trip the alarm. Two of the malware's own private encryption keys showing up in one file is super sus.
1 of (...byte-signature...) and 2 of (...text...) - If only one byte-signature is found, I want to make sure it's corroborated with at least two of the text clues before tripping the alarm.

This layered approach is what gives the rule it's low false-positive rate. Hopefully, it's not so strict that it misses a slightly modified variant or trips on non-malicious binaries.

We can run this rule directly with:

yr scan <path_to_rules_file> <path_to_target>

Example run from a compiled from source YARA-X on the Alpine container:

./yara-x/target/release/yr scan -s githubcode3.yar GITHUBCODE
 
GITHUBCODE_macos_stealer GITHUBCODE
0x4c310:8:$seed_x64: 41 a3 c5 91 55 b3 1c 47
0xa4300:8:$seed_arm: 8b 84 e2 0a 56 8f dc c3
0x3aef8:4:$magic: fa de 07 11
0x55632:24:$cfg: _g_serialized_build_info
0xae44a:24:$cfg: _g_serialized_build_info
0x5647d:32:$persist: com.apple.dt.testmanagerd.runner
0x565fe:32:$persist: com.apple.dt.testmanagerd.runner
0xaf30e:32:$persist: com.apple.dt.testmanagerd.runner
0xaf48f:32:$persist: com.apple.dt.testmanagerd.runner
0x56281:22:$plist: <!DOCTYPE plist PUBLIC
0xaf112:22:$plist: <!DOCTYPE plist PUBLIC
0x4f764:20:$cc: CCKeyDerivationPBKDF
0x51d90:20:$cc: CCKeyDerivationPBKDF
0xa8774:20:$cc: CCKeyDerivationPBKDF
0xaacf0:20:$cc: CCKeyDerivationPBKDF

The -s shows string output found, but without any match would have outputted the rule name which it did. If it outputted nothing then you're good.

There are a collection of YARA rules at https://yarahq.github.io/ and https://github.com/Neo23x0/signature-base

Nice to get some experience writing YARA rules other than on TryHackMe.

Manual Detection

So, cool, we have a rule, but now some mechanism has to run it on every binary. If a machine is already compromised, it's good to check specifically for the plist adds and anything suspicious under:

/Library/LaunchAgents/com.apple.dt.testmanagerd.runner

Look closer at:

ls -la ~/Library/LaunchAgents/ /Library/LaunchAgents/ /Library/LaunchDaemons/ 2>/dev/null

Dump those existing .plists:

for p in ~/Library/LaunchAgents/*.plist /Library/Launch*/*.plist; do
    echo "== \(p =="; defaults read "\)p" 2>/dev/null
  done

Look for any labels and program arguments pointing outside of /System or /usr:

grep -rliE 'testmanagerd|coresymbolicationd' \
    ~/Library/LaunchAgents /Library/LaunchAgents /Library/LaunchDaemons 2>/dev/null

See if anything is loaded right now matching the suspect .plist:

launchctl list | grep -viE '^\-|com\.apple\.(?!.*test)' 2>/dev/null

What to do if you find IoC

Log out of your sessions in your browser, rotate any secrets you have stored (.ssh), and lock down your credit through Experian, Transunion, and Equifax. This is free.

Ultimately, I'd wipe/re-install the machine to be super safe. It's the persistence that should worry you. If there is persistence, no amount of rotation is going to help if it just keeps stealing your secrets.

Update: The breakdown above is relevant for V1, it's been updated after testing to make sure the arm_64 slice gets detected along with additional byte-signatures for that seed. Just because the version found in that git repository was universal binary for MacOS, doesn't mean there aren't other 'thin' versions out there. Safer to detect both explicitly.

Malware Detection

What is YARA?

Import

Rule Name

Metadata

Strings

Condition

Manual Detection

What to do if you find IoC

References

Comments

Malware Analysis

Malicious Github Repository Analysis

More from this blog

Malicious Binary Reverse Engineering

Deeper Malware Binary Analysis

Malicious Binary Analysis

Malicious IPs and Domain Reconnaissance

Command Palette

What is YARA?

Import

Rule Name

Metadata

Strings

Condition

Manual Detection

What to do if you find IoC

References

Comments

Malware Analysis

Malicious Github Repository Analysis

More from this blog