The widespread disruptions caused by the recent Microsoft and CrowdStrike outage exposed the precariousness of our hyper-connected world. This incident, affecting sectors from aviation to healthcare, underscored the systemic risks inherent in our over-reliance on a handful of critical infrastructure providers. This insight delves into the specifics, how that impacts experiences, and what can companies do to safeguards their customers and stakeholders.
Table of Contents
Microsoft and CrowdStrike Outage Explained
CrowdStrike is a cybersecurity company that offers a cloud-based platform to protect businesses from cyberattacks. Their platform focuses on safeguarding key areas like endpoints, cloud workloads, identity, and data. By using advanced technology and threat intelligence, CrowdStrike helps businesses detect, prevent, and respond to cyber threats efficiently. Their goal is to provide strong security while being easy to use and implement.
The outage was the result of a defect found in a single content update issued at 04:09 UTC on 18 July 2024. It included the faulty kernel driver csagent.sys
, causing affected machines to enter a blue screen of death with the stop code PAGE_FAULT_IN_NONPAGED_AREA
. for Windows hosts. Mac and Linux hosts are not impacted. As noted by CrowdStrike representatives, this outage is not related to a security incident or cyberattack.

Outage Implications
The Microsoft and CrowdStrike outage had far-reaching consequences, impacting various industries in distinct ways. Let’s delve into a few specific examples:
Travel
The aviation industry was one of the most visibly affected sectors. With flight operations heavily reliant on digital systems for everything from ticketing and boarding to air traffic control, the outage caused widespread disruptions. Airlines around the world including Hong Kong Airlines, KLM, IndiGo, Porter Airlines, and others faced massive cancellations, delays, and operational challenges.
Passengers experienced significant inconvenience, financial losses, and disrupted travel plans that may be unresolved for hours or days. Employees may be stranded in destinations which would further impact scheduling.

Financial Services
Financial institutions worldwide were impacted by the outage. In South Africa, Capitec Bank and other lenders experienced difficulties. The Philippines was significantly affected, with major banks like RCBC, Metrobank, LandBank, BDO, UnionBank, BPI, and PNB reporting online system failures. Digital payment platforms such as Maya and GCash were also affected in the Philippines. DenizBank in Turkey faced accessibility issues with its website and mobile banking app. Bradesco Bank in Brazil confirmed it was affected as their customers were notable to login. As at 12:00 UTC, the bank disabled the login button.
For different financial institutions, online banking, electronic payments, and stock trading were severely hampered, leading to financial losses for both institutions and customers. The outage exposed the vulnerability of the modern financial system to system failures. It underscored the importance of redundant systems, data backups, and disaster recovery protocols in the financial industry.
Social Services
The outage led to significant disruptions in 911 emergency services across several states, including Alaska, Arizona, Florida, Indiana, Kansas, Michigan, Minnesota, New York, Ohio, Pennsylvania, and New Hampshire. Some states experienced complete 911 outages, while others faced difficulties with 911 call centers.
Heathcare
While not as immediately visible as the impacts on aviation or finance, the healthcare industry also faced challenges. Electronic health records, patient management systems, medical employee scheduling software and medical device operations were disrupted, potentially affecting patient care.
Surgeries might have been delayed, and critical patient data could have been inaccessible. This incident emphasized the need for robust IT infrastructure and contingency plans in healthcare settings to ensure patient safety and continuity of care.
Retail
The retail industry, heavily reliant on digital platforms for sales, inventory management, and customer relationship management, was significantly impacted. Online stores experienced outages, point-of-sale systems malfunctioned, and supply chain operations were disrupted. This led to lost sales, customer dissatisfaction, and operational inefficiencies. The outage highlighted the importance of omni-channel strategies and robust IT systems in the retail industry.

Beyond Immediate Impact
Beyond the immediate impact on businesses and consumers, the outage revealed deeper vulnerabilities in our digital ecosystem. The ripple effects extended far beyond the industries discussed. Supply chains were disrupted, economic activity is slowed by incomplete transactions, and consumer confidence may be shaken. The outage served as a reminder of the interconnectedness of our global economy and the potential of a single point of failure.
Customer Experience
Across all industries, the outage resulted in a deterioration or removal of potential key customer experience processes. As a result, impacted customers could face frustration, inconvenience, and potential financial losses. This incident handled without care could erode trust in businesses and highlighted the importance of effective crisis communication and customer support. Organizations should prioritize building resilient systems and providing exceptional customer service to mitigate the impact of future disruptions.
The examples above underscore the risks associated with the over-reliance on a few critical technology providers. It is imperative for industries to invest in building resilient infrastructure, diversifying technology suppliers, and developing comprehensive business resumption/disaster recovery plans to protect against future disruptions.
James Connell, Senior Global Marketing, Branding, OmniChannel and Digital Transformation Executive said, “Reassuring your customers that their data is safe, provide proactive updates that are timely on social channels, website and email that are clear and succinctly worded.”
What Happens Next?
A major outage like the one caused by the Microsoft and Crowdstrike incident demands a swift and coordinated response to minimize customer disruption. Rapid incident response and clear communication are paramount.
First Response
Establishing a dedicated incident response team to manage the situation is crucial. This team should be responsible for disseminating timely, accurate, and transparent information to both internal and external stakeholders, including customers. By providing regular updates on the outage’s nature, cause, and resolution efforts, organizations can alleviate customer and employee anxiety and maintain trust.

Prioritizing critical systems is essential to mitigate the impact on customer experience. Identifying systems that are vital to business operations and customer interactions allows organizations to focus restoration efforts accordingly.
Implementing contingency plans for essential services ensures continued operations during the outage, reducing customer inconvenience. Company should also leverage alternative communication channels, such as phone, email, and social media. It is crucial to maintain contact with customers and provide support, where possible. Proactive customer outreach to high-impact customers can help mitigate issues and demonstrate care and empathy. Transparency, timeliness, and consistency are key.
Performance Metrics
To measure the effectiveness of the response, company can employ the following performance metrics. Outage duration, including the time to restore critical systems, provides a clear indicator of the incident’s impact. This may also resolution time for specific issues or service restoration. Information on the estimated time the outage will be resolved could be used to determine the next communication touchpoint.
Assessing the number of customers affected and the severity of their experience helps gauge the overall impact of customer expectation versus experience. Physical and online engagement can be deployed to manage expectation and resolve issues.
Customer satisfaction surveys/feedbacks and social media monitoring can provide insights into customer sentiment during and after the outage. Additionally, evaluating the performance of the response team and the effectiveness of communication and collaboration is essential for identifying areas of improvement.
Tracking system availability before, during, and after the outage helps assess the overall resilience of the IT infrastructure. Finally, estimating the financial loss due to the outage provides a quantitative measure of the incident’s impact. By focusing on these strategies and metrics, organizations can enhance their ability to respond to and recover from future disruptions, safeguarding customer experience and protecting their reputation. Building trust is an ongoing endeavour!
Transform For The Better
The reliance on a few technology vendors could create systemic vulnerabilities. There is an opportunity for companies to collaborate and explore alternative tools for developing and deploying critical infrastructure that could report issues faster and ensure there is more redundancy on essential services. Training, testing, and reporting incident processes will ensure companies can manage expectations in an organized manner.
As we move forward, it is imperative that we learn from this incident. By building more resilient, diverse, and equitable digital infrastructures, we can mitigate the risks of future disruptions sooner and reduce the financial, reputation, and service implications.
How Can We Help?
Transformidy is available to assist in helping you understand trust and assess how trustworthy your company is.
Contact us or set up a 30 minute complimentary consultation for more information on our services, insights, or showcases. We look forward to hearing from you.