david_linthicum
Contributor

6 features your cloudops tool must have

analysis
Jun 29, 20213 mins
Cloud ArchitectureCloud ComputingCloud Management

A cloudops technology stack is easier to define than to design. Here are 6 capabilities to look for.

cloud computing 1
Credit: Client supplied

We’re still defining what cloudops is exactly, as well as clarifying what technology is needed to solve the core problems.

Like all cloud computing situations, it’s helpful to break down the core components of a working cloudops solution, such as AIops. Also, to define what the technology needs to do and the value it brings to the table. To this end I picked out six capabilities a cloudops tool should offer: 

Observe and gather data from any number of systems that are needed to find patterns to further analyze and act on. This has a few components to it, including the ability to leverage connectors and/or agents to communicate with the system under management, as well as to get the data back to some type of centralized cloudops system in a reliable way.

Correlate massive amounts of system data (noise) in meaningful ways. This includes determining patterns, such as where the data is coming from, and grouping the data before it can be analyzed in some deeper way.

Analyze the patterns to determine problems and root causes. This is really where the AIops or general cloudops tool makes its money. It should be able to find patterns in the data being gathered and correlated and determine patterns that indicate current issues, such as a failed networking device. Or more importantly, predict issues that are likely to occur. Proactive cloudops can help avoid a major problem, such as identifying a cloud storage system that is kicking off I/O errors, which could indicate that a failure is imminent.

Share the observability findings with ops team users, as well as automate processes that can respond automatically and fix the issues. It’s one thing to indicate that something is wrong; it’s another to make sure those processes and people who can fix the thing are notified. Here is where things are improving fast, including automated ticketing systems and self-healing processes.

Respond to the problem and launch an automated fix or collaboration to get to a fix. This means the mechanisms are in place to fix the problem. Automation is taking over here, either as part of the cloudops tool or another orchestration layer that can define how common issues are fixed without humans having to get involved.

Inform reports and dashboards so cloudops users can see both strategic and tactical data as to the effectiveness of the systems over time. Dashboards show the health of the systems now and how things are trending, thus predicting future health. Although cloudops teams are hesitant to leverage these teamwide, my advice is to make sure that anyone associated with cloudops or development can see these metrics in real time and thus make good decisions to improve things.

Again, there is no magic to solve the cloudops problems. Much of what I’m recommending may not be doable for some enterprises without more than a single AIops or other cloudops technology in place. It’s dependent on the types of systems and cloud you’re running and the number and types of applications and data stores.

However, addressing these six concepts is a good start that will likely get you where you need to go.

david_linthicum
Contributor

David S. Linthicum is an internationally recognized industry expert and thought leader. Dave has authored 13 books on computing, the latest of which is An Insider’s Guide to Cloud Computing. Dave’s industry experience includes tenures as CTO and CEO of several successful software companies, and upper-level management positions in Fortune 100 companies. He keynotes leading technology conferences on cloud computing, SOA, enterprise application integration, and enterprise architecture. Dave writes the Cloud Computing blog for InfoWorld. His views are his own.

More from this author