Key lessons from the time when the digital world stood still
In the past few weeks, we’ve seen two devastating “blue screen of death” IT outages across the globe: from the initial CrowdStrike outage that affected 8.5 million Windows devices to the latest DDoS-related Microsoft crash. While the immediate impacts remain unclear, we can expect both outages to have significant, long-term ramifications.
The immediate crash is already estimated to have cost U.S. Fortune 500 companies up to $5.4 billion in damages, with those in the banking and healthcare sectors expected to be hit the hardest. In addition, the disruption sent countless organizations scrambling to restore their systems and secure their data, creating a chaotic environment ripe for exploitation. The turmoil not only exposed vulnerabilities, but also weakened cybersecurity defenses, leaving enterprises far more susceptible to cybercriminals who could quickly exploit them in times of crisis.
As we grapple with the aftermath of global technology disruptions, what can IT and security leaders learn from the time when the digital world stood still?
VP Global Advisory CISO at BlackBerry Cybersecurity.
The Cracks in Our Global Digital Infrastructure
Ultimately, the outages highlighted the often-overlooked physical and logistical challenges of managing a distributed IT infrastructure. As the crisis unfolded, it became clear that resolving the issue required rebooting systems in safe mode with administrative privileges. However, this process is both nightmarish and time-consuming, particularly for large and distributed enterprises. Many organizations also struggled to access and repair remote systems, particularly those in hard-to-reach locations.
This is evident in the sheer volume and diversity of sectors affected by the crash, from banking and airlines to hotels and hospitals. It showed us how a single point of failure can spread across the intricate web of our digital infrastructure and impact multiple industries. At the same time, the scale of the outage highlighted the importance of skilled IT support and robust Managed Security Service Providers (MSSPs). Most importantly, we saw professionals from Microsoft, SonicWall and SentinelOne working together directly to diagnose and resolve the issue. Their combined efforts underline the immense value of industry collaboration, which remains one of the cybersecurity industry’s greatest strengths.
Key Lessons from the Global IT Outage
When a major incident occurs, there is always a lesson trail to be discovered. This outage is a critical time for all organizations to assess their software supply chain and the operational risks to their business. This is especially true for cybersecurity software that runs deep within our software stacks, where adversaries attack, but also where one bad line of code can bring down the entire system.
As the immediate effects of the global outage subside, CIOs and CISOs must ask themselves: Do we have the right balance to deliver the disaster recovery and business continuity needed when this inevitably happens again? If that’s a tough question to answer, IT and security leaders should consider the following:
1. Improving process discipline – Strong management processes are critical, especially security tool updates. Security leaders should implement rigorous testing protocols before deploying updates to the infrastructure. If a vendor manages this process, it is essential to inquire about their remediation plans for problematic updates.
2. Implementing multi-vendor strategies – While consolidation is popular, this incident highlights the importance of strategically diversifying vendors to mitigate risk and avoid single points of failure. Critically examining your current setup to identify potential single points of failure should be a priority. Then consider robust Managed Detection and Response (MDR) solutions with open XDR capabilities that are best suited to supporting a diverse IT or security stack. The alternative locks users into a single vendor and exposes them to potential vulnerabilities.
3. Strengthening endpoint security – Often, outages are caused by outdated cybersecurity practices, where complex EDR and heavy endpoint agents pose a significant infrastructure risk and are unnecessarily complex. By deploying lightweight AI on the endpoint, you can avoid these types of outages as it protects your environment without heavy agents and frequent updates that put your operations at risk.
4. Integrate AI responsibly – While it may seem unrelated, developing clear policies for AI integration into cybersecurity operations is essential. This foresight will help prevent future large-scale problems as AI becomes more integrated into tech stacks. While AI offers a promising path forward, it is far from finished. IT and security leaders must therefore remain vigilant and adaptive, and be prepared to address the evolving vulnerabilities AI can introduce with an innovative, yet responsible approach.
5. Leverage real-time communication capabilities – Given the outage affected some of the world’s most critical systems, networks and applications, the response required speed, accuracy and accountability. This is where a critical event management (CEM) solution can provide real-time visibility to ensure a rapid and informed response to recover from business disruptions. At the same time, this will provide a paper trail of incident communication to prove the situation was handled with accountability and compliance at the forefront.
6. Conduct regular testing to remove blind spots – Understanding your vulnerabilities and risks through regular testing is paramount, not only when implementing new software, but consistently over time. To protect against potential threat actors attempting to capitalize on IT outages, a combination of AI-driven internal and external penetration testing assessments remains vital. These will reveal how an external threat actor can compromise assets through ever-evolving tactics, techniques, and procedures. The performance and security of your systems are only as good as the least secure hardware and software components, so blind spots must be addressed as a priority to keep businesses functioning as normal.
These global technical outages were a stark reminder of the critical need for digital independence and robust management processes. Now, industry leaders must translate these lessons into actionable strategies, using this experience to build more resilient and adaptive cybersecurity frameworks. In this space, it’s not a matter of if the next crisis will happen, but when. The strength of the cybersecurity industry lies not only in our individual expertise, but in our collective response to challenges. By fostering collaboration, embracing strategic complexity, and continuously improving processes, future crises can be addressed with greater confidence and effectiveness.
We provide an overview of the best data recovery services.
This article was produced as part of TechRadarPro’s Expert Insights channel, where we showcase the best and brightest minds in the technology sector today. The views expressed here are those of the author and do not necessarily represent those of TechRadarPro or Future plc. If you’re interested in contributing, you can read more here: