TL;DR: The current wave of digital instability highlights a critical paradox: we have traded independence for speed, security, and efficiency of global cloud consolidation. While hyperscalers have democratized high-end computing, the resulting architecture means single errors now trigger massive, cascading blackouts across our essential infrastructure. As a result, 2026 is becoming the year when both governments and enterprises aggressively adopt multi-cloud solutions, edge computing, and on-premise repatriation to ensure that a single cloud failure does not lead to total operational paralysis.
An Unreliable Future
Over the past couple of decades, not only have lives gone online, but our economy is dependent on the digital world. This is due to substantial investment by large companies to create user-friendly infrastructure for launching new businesses. Today, Amazon (AWS), Microsoft (Azure), and Google (GCP) own over 60% of cloud market share and are also estimated to be spending upwards of $200b, $140b, and $175b, respectively, in 2026 capex to expand capacity.
There are real benefits to structuring the ecosystem this way. Market consolidation has driven tremendous economies of scale that have democratized access to the most powerful computing possible; hyperscalers are more energy efficient than independent server rooms, and they are the largest buyers of renewable energy. There is true global resilience today - no matter where someone is in the world, from a speed and reliability standpoint, the end user rarely notices, or even cares, where their content is coming from. Most importantly, though, it does not matter if you are a F100 company or a mom-and-pop shop; you have access to enterprise-grade security.
However, this increased reliance on a few players poses risks for global infrastructure and access. In just the last six months, we have had major outages that highlight the fragility of our digital ecosystems.
Recent Major Outages:
February 18, 2026: YouTube suffers a global outage affecting video recommendations and homepage loading for hundreds of thousands of users.
February 12, 2026: Supabase experienced a major outage affecting all services in the US-East-2 region for almost four hours.
November 18, 2025: A Cloudflare configuration error caused a major 4-hour global outage due to a faulty database update, affecting websites and services such as X, Discord, and OpenAI.
October 29, 2025: Microsoft Azure experienced a major global outage triggered by an inadvertent configuration change in Azure Front Door, the company's edge network, resulting in widespread service disruptions.
October 20, 2025: Amazon Web Services (AWS) experienced a major 15-hour outage primarily affecting US-EAST-1. The incident was caused by a "latent race condition" within the DynamoDB DNS management system.
Even though there are positives, it is fair to say that as we become increasingly reliant on the internet and being digitally connected, the risks posed by the market consolidation of a few tech powerhouses compound.
Why is this happening more often?
Consolidation: We have moved away from a many-to-many internet to one where a handful of companies provide the infrastructure. This creates faults that can have cascading effects.
AI: The surge in AI has put far more load on data centers, and systems are being stressed by the compute and power demands of AI, which can create capacity crunches.
Just today, it was reported that the 13+ hour AWS outage was caused by an internal tool, Kiro AI, which deleted and recreated an environment.
Global data center occupancy is projected to reach over 95% by late 2026. For context, anything over 90% is considered "effectively full" due to the need for buffer space (Goldman).
Traditional server racks use 7-10 kW of power. Modern AI racks (using H100/B200 chips) require 30-100+ kW per rack (IEA).
Complexity: Modern cloud systems can have billions of lines of code and “microservices,” where a single bug can ripple through the entire system and cascade to millions of applications.
Macro risks: Risks around physical disruptions (undersea cable cuts, extreme weather events, etc.) and layoffs in the tech sector play major roles in the stability of these services.
There are different ways to fix this, and most businesses understand that, at scale, you need a multi-cloud approach to minimize the risk of service outage.
Edge Computing is another innovation in which power is distributed to thousands of smaller servers to reduce latency and host certain services that should not be risked in larger data centers (for example, smart traffic lights). To get even closer to the source, enterprises are bringing workloads on-prem at an astonishing rate. According to IDC, around 80% of enterprises expect to repatriate some compute or storage workloads from the public cloud. Companies like 37signals projected saving $10m over five years by moving their workloads on-prem.
This has all become such an important risk that governments are even realizing that the cloud is too big to fail. Laws like DORA (Digital Operational Resilience Act) in Europe force banks and critical services to prove they have an exit strategy and must test what happens if their cloud provider disappears or has a major outage.
Takeaway: The current wave of digital instability highlights a critical paradox: we have traded independence for speed, security, and efficiency of global cloud consolidation. While hyperscalers have democratized high-end computing, the resulting architecture means single errors now trigger massive, cascading blackouts across our essential infrastructure. As a result, 2026 is becoming the year when both governments and enterprises aggressively adopt multi-cloud solutions, edge computing, and on-premise repatriation to ensure that a single cloud failure does not lead to total operational paralysis.
Have a great weekend,
Josh
