Complexity is a natural byproduct of a highly heterogeneous and distributed architecture. Now we better understand its impact. Credit: Thinkstock Although we get different messages from cloud computing providers, we now have data that suggests public cloud outages are getting worse. The Uptime Institute recently released its 2022 Outage Analysis report that included such findings as “high outage rates remain an issue.” Indeed, one in five organizations reported a “serious” or “severe” outage that resulted in significant financial losses, reputational damage, compliance breaches, or, in some severe cases, loss of life. The report concludes that there has been a slight upward trend in the prevalence of major outages in the past three years. I’m usually not one to bust out the quotes, but this statement by Andy Lawrence of the Uptime Institute is worth mentioning: “The lack of improvement in overall outage rates is partly the result of the immensity of recent investment in digital infrastructure and all the associated complexity that operators face as they transition to hybrid, distributed architectures.” Complexity is not a new challenge for IT. However, we recently created much more complexity through quick digital transformations and the wild rush to cloud and multicloud in response to the pandemic. These factors resulted in a new, high headcount in the types of systems that support businesses. Most enterprises reported that they once supported about 500 cloud services for the entire enterprise and now support about 3,000 services over a multicloud deployment. These numbers indicate that the technology doesn’t cause the outages; it’s how the technology is used and the amount of technology in use. As the report states, nearly 40% of organizations have suffered a major outage caused by human error. Of these incidents, 85% have a root cause of staff failing to follow procedures or flaws in the processes and procedures themselves. The root causes of complexity are well understood. There are many more moving parts to oversee in multicloud and cloud architectures and not enough money to quadruple operations staff. Cause, meet effect. Why does this complexity happen in the first place? Much better operations tools are now available, such as AIops and cross-cloud multicloud monitoring solutions. These tools allow developers and innovators to leverage best-of-breed technologies to build and deploy business-changing technologies. Developers can deploy the optimal choices for storage systems, AI systems, compute, databases, etc., that may come from one or (more likely) many cloud providers. The result is a complex and highly heterogenous multicloud deployment that requires staff with specialized skills to effectively operate and limit the number of outages. Ironically, most IT organizations can’t get approval for an increased ops budget because cloud computing promised to make operations less expensive. What’s the solution? As I’ve stated here a few times, abstraction and automation layers remove humans (and human errors) from the front and center of all operations processes. These layers also include tools for ops planning or replanning to optimize multicloud operations, which can take your operations game to the next level. That brings us back to the original problem. Rebooting cloud and multicloud operations to incorporate abstraction and automation layers translates into more money and skills. Until enterprises reach a tipping point where the complexity costs more to manage than it does to directly address, we’ll see more outages. It’s too bad that we must do damage just to understand how to avoid doing damage. Sadly, we’ve been here many times before. Related content analysis Generative AI won’t fix cloud migration You’ve probably heard how generative AI will solve all cloud migration problems. It’s not that simple. Generative AI could actually make it harder and more costly. By David Linthicum Jul 12, 2024 5 mins Generative AI Artificial Intelligence Cloud Computing analysis All the brilliance of AI on minimalist platforms Buy all the processing and storage you can or go with a minimum viable platform? AI developers and designers are dividing into two camps. By David Linthicum Jul 09, 2024 5 mins Generative AI Cloud Architecture Artificial Intelligence analysis The next 10 years for cloud computing Despite AI's explosive growth, the industry still needs to face facts that customers are unhappy about costs and vendor lock-in. By David Linthicum Jul 05, 2024 5 mins Amazon Web Services Google Cloud Platform Microsoft Azure analysis Serverless cloud technology fades away Serverless was a big deal for a hot minute, but now it seems old-fashioned, even though its basic elements, agility and scalability, are still relevant. By David Linthicum Jul 02, 2024 4 mins Serverless Computing Cloud Computing Software Development Resources Videos