Linux

The $5.4 Billion Bug That Crashed The World



3% Deposit Bonus – Lock In High Yields Long Term:

Crowdstrike is the world’s largest cybersecurity firm. They were trusted by much of the Fortune 500, governments, and public services worldwide. But, just one coding error turned them into the world’s most infamous cybersecurity firm due to the raw scale of the issue. In July of 2024, Crowdstrike pushed out an update that immediately crashed every computer that received the update. They spotted the issue relatively early and only 8.5 million computers were affected. However, many of these 8.5 million computers were at the hearts and souls of corporate and government infrastructure leading to massive global outages. And the worst part is that all of this could have been avoided if Crowdstrike had followed industry standard staging practices. This video explains the Crowdstrike incident and how one coding error led to $5.4 billion in losses.

Free Weekly Newsletter With Insiders:

Socials:

Discord Community:

Timestamps:
0:00 – The World Goes Dark
0:41 – Hour By Hour
8:05 – What Happened

Resources:

Disclaimer:
This video is not a solicitation or personal financial advice. All investing involves risk. Please do your own research.

[ad_2]

source

Related Articles

33 Comments

  1. Pretty much of on overstatement or overreaction saying the WORLD. Afaik only multinational companies and airlines got affected in my country. pretty ridiculous having to rely on that software

  2. While you're correct that Crowdstrike should have checked the update, and respected staging, there's actually an even bigger problem: The app isn't doing any sanity checking on the data files in the updates it gets. One bad read or flipped bit on an otherwise correct update would bluescreen an otherwise perfectly healthy PC because the app just blindly executes the update.

  3. Also to highlight how bad they are, 3+ months before their version for Linux had a similar issue with a pushed update, then three WEEKS before another Linux update issue. That should have caused a massive rethink on QA processes for all their products.

  4. Deploying the fix prevented any future hosts from getting it, and any host that we're still online and hadn't process the file yet could still grab the newest update immediately and then resolve itself. Basically the fix was removing the update file.

    But their plan for going forward is a solid plan. One for better bug handling within the code, because it is a kernel level driver so any bugs are very bad and will crash the system immediately. And then also coming out with a canary program where customers can choose which systems get the latest updates faster and then which systems get a slow rub date not important dev systems type of thing. Before this all customers got the same update file at the exact same time You could not choose when you got updates. This is both good and bad because it ensures all your customers have the latest files and are always protected to the up to the second issues up there so they're not getting hit with Ransom and viruses. But it's bad because it prevents any issues like this from being weeded out first. And also they are now implementing an internal deployment as well for the initial deployment so that way they see any issues that might happen.

    Microsoft tried to release a kernel API that programs that need kernel level access can still get it but not The implemented into the kernel where that is bad is something bad goes wrong. Apple does it this way today so that's why you very rarely see Kernal Panic issues. But when Microsoft tried to implement it years ago, the EU blocked it because they theorized that Microsoft could have more access to the kernel and therefore release products that can do more than competitors could and thus making them a monopoly. Even though Apple does it this way today the EU doesn't understand how API and kernels work. And Microsoft has a strict kernel level driver checking program that must be signed and investigated by Microsoft first before deployment and signing of the driver. And how they could miss such bad error check error handling is beyond me. Hopefully crowdStrike also implements checks on there nightly profile patches updates that run on their own and get deployed. I'm a huge crowd strike fan and I'm really hoping they survive this because they are an amazing company. But right now they are growing at such a rapid pace they're just trying to keep up, even their employees they've hired almost three times the amount of employees they had 2 years ago that's how fast they are growing. So fingers crossed they survive, I think they handled the issue very well took responsibility right away and deployed a fix and worked with Microsoft and partners to figure out how to manually fix it to get systems back online as soon as possible. And then their post-mortem that they released, hopefully everyone will agree that that set of steps and structures that they put in is the best to move forward

  5. Microsoft will obviously shift the blame over to Crowdstrike. But it's ultimately their OS powering millions of computers , security , businesses and public services. The effect of 1 faulty software update has dramatic consequences on this scale? We need back up systems put in place to make sure that nothing like this ever happens again.

  6. This is the problem with PUSH updates. It is my device, I know best when I am prepared to risk an update. It kinda ruins your day when your critical servers suddenly get wonky, or better yet just reboot autonomously …. BIG reason to consider Linux …. at least that platform gives you control over updates, like Win7 used to 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button