Crowdstrike BSOD and Boot Loop Fix

Approximately 8.5 million Windows computers running CrowdStrike Falcon were affect by an update on Thursday night (07-18-2024). Luckily, I wasn't affected even though I was online. I suspect that the configuration/ruleset .sys wasn't loaded before the fixed version was pushed.

gci -Path "C:\Windows\System32\drivers\CrowdStrike" -Recurse -Filter "C-00000291-*"

I had both files under C:\Windows\System32\drivers\CrowdStrike which I then backed-up to BAK and deleted just C-00000291-00000000-00000029.sys:

If my machine was affected and I was caught in a boot loop, I would have to boot into safe mode with BitLocker key and delete C-00000291-00000000-00000029.sys explicitly. Alternatively could remove the SSD and install it on another PC if I didn't have BitLocker enabled to delete that file.

I was curious as to what the differences were between C-00000291-00000000-00000029.sys and C-00000291-00000000-00000030.sys

C-00000291-00000000-00000029.sys is just full of nulls.

Unfortunately, Azure virtual machines were affected. To fix those machines, we would have to create a new virtual machine and mount the disks belonging to the affected VMs to those repair VMs and delete the files. Luckily, this can be done with Azure CLI and Microsoft has already implemented a flag to fix/delete the affected driver.

Output from Serial Log from Boot Diagnostics:

<INSTANCE CLASSNAME="BLUESCREEN">
<PROPERTY NAME="STOPCODE" TYPE="string"><VALUE>"0x7E"</VALUE></PROPERTY><machine-info>
<name>s00374vmpirpr</name>
<guid>d7c2a973-e174-4268-942c-d776535a2956</guid>
<processor-architecture>AMD64</processor-architecture>
<os-version>10.0</os-version>
<os-build-number>17763</os-build-number>
<os-product>Windows Server 2019 Datacenter</os-product>
<os-service-pack>None</os-service-pack>
</machine-info>

</INSTANCE>
</BP>

!SAC>
Your PC ran into a problem and needs to restart.
If you call a support person, give them this info:
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED

csagent.sys

az login --use-device-code
az account set -s [YOUR_SUBSCRIPTION]
az vm repair create -n [AFFECTED_VM] -g [RESOURCE_GROUP] --verbose
az vm repair run -g [RESOURCE_GROUP] -n [AFFECTED_VM] --run-id win-crowdstrike-fix-bootloop --run-on-repair --verbose
az vm repair restore -n [AFFECTED_VM] -g [RESOURCE_GROUP] --verbose

You should be able to completely bypass the administrator credential prompt on creation by passing in the flags to the az vm repair create (I didn't test these flags):

--admin-username [ADMINISTRATOR_NAME]--admin-password [ADMINSTRATOR_PASSWORD]

I was able to run these commands to fix our affected VMs after obtaining the permissions in an official manner. In an unofficial manner, through credential exfiltration through CI/CD pipeline similar to a WrongSecrets challenge...

Summary

We don't have many details as to why this happened. One would assume that there are many quality check points prior to a release so many things may have had to go wrong for this to occur. Seems unlikely that a file of nulls would have slipped through this release process. Could it be possible that the file was nulled out by a proxy or CDN? It also seems that these configuration files are not signed either and the loading logic could be more robust.

Edit (07-20-2024): The file of nulls is suspected to cause a null pointer dereference as indicated in the reference YouTube videos. It is also not a good look for former McAfee CTO and current CEO George Kurtz. Linux machines were also affected in the past, but the kernel panics with RockyLinux.

Update: Some machines will fix themselves if they are able to boot into the OS and can access the internet/network since Crowdstrike has pushed out a fix C-00000291-00000000-00000030.sys (and higher configuration versions). This is not reliable since it depends on race conditions so a manual fix is preferable.

CrowdStrike BSOD and Boot Loop

How to Fix

Summary

References