Key Criteria for AIOpsv1.0

Table of Contents

  1. Summary
  2. About the Key Criteria
  3. Report Methodology
  4. Primer: AIOps
  5. Decision Criteria Analysis
  6. Conclusion

1. Summary

AIOps tools, as the name implies, leverages a range of AI capabilities to enhance IT operations, including knowledge-driven predictive analytics and natural-language processing. AIOps is a technology that can be applied to all types of CloudOps tasks. These tools are built either to automate simple tasks so that IT operators can focus on more strategic work, or to perform tasks that are beyond a human’s capabilities.

One can think of AIOps as a cross between active tools and those that can learn from being active. This is an important distinction. Tools must carry out pre-programmed, self-corrective processes, and it’s the AIOps tools’ ability to learn during these processes that creates a huge advantage. For instance, an AIOps tool might understand that performance issues could be the result of saturation caused by cyber-attacks, and that the situation should kick off security processes to mount a defense. For traditional tools, such an incident would be addressed as a simple performance issue, and not recognized as a security threat.

Moreover, and most important, AIOps tools have the ability to deal with thousands of data points and make correlations that most humans would not make. For instance, data update errors that lead to a pattern, and then leads to the identification of a bad network connection that would normally take weeks to diagnose.

However, the world of AIOps presents a duality. On the one hand, it’s an emerging technology that for the first time mashes up operations and AI. On the other, many of the solutions in this space are traditional tools that have been updated to leverage AI. This mix of old and new, traditional players and startups, makes this space particularly interesting. Will traditional ops tools perhaps have more maturity and connections into specific systems, and therefore thrive? Or will new purpose-built tools fully leverage AI technology to enable more innovative approaches?

Conclusions reached in this report include:

  • The AIOps tools in the market today are on a spectrum with regard to use of AI. While some make use of knowledge engines systemically in the monitoring and management of cloud and non-cloud systems, most tools leverage AI as an afterthought, not driving much of the functionality of the tool.
  • Enterprises are typically adopting AIOps as an upgrade to existing ops tools, and are remaining brand loyal. This means that the upstarts in the AIOps space will find it difficult to break into a market where the established players are in essence selling with the same basic message: AI integrated with management and monitoring that you trust. Considering this, we may see a consolidation next year as the market focuses on a handful of players, down from the two dozen or so relevant players today.
  • There seems to be two directions in AIOps: self-healing and not self-healing. Some AIOps systems are able to heal issues with systems that are managed and/or monitored. This means that if the tool finds an issue, a process is launched to attempt to correct the problem, for instance restarting a server or a network hub. Other solutions are more passive, alerting users about an issue, but without taking automated corrective action. The trend is toward active, or self-healing, AIOps tools.
  • These tools are all about the data. They store data as they monitor systems and can determine issues that need immediate attention, such as a down storage server. Or, they can deeply analyze historical data to determine trends that may portend a failure or other potential issue. The lifeblood of any AI system is the data needed to train the AI model, and this is the opportunity presented to AIOps tools. Monitored cloud or on-premises systems spin-off gigabytes of data each week, and that data can be fed into analytic systems augmented by AI.
  • Enterprises that wish to leverage these tools should be careful to understand their capabilities, and should also test the tools across both enterprise cloud and non-cloud platforms. There have been compatibility issues reported, most discovered after deployment.
  • Many of these tools are moving to an “on-demand” model, meaning that they will offer cloud-based services. This is an opportunity for those that have, or will have, the majority of their systems on public clouds. However, it may not be a good model for those that still have the majority of systems on-premises.