Table of Contents
1. Executive Summary
In today’s rapidly evolving IT landscape, AIOps is revolutionizing how organizations manage and resolve complex IT issues. By harnessing the power of artificial intelligence, machine learning, and big data analytics, AIOps drives efficiency and innovation by automating the identification and resolution of common IT problems.
A significant advancement in AIOps is the integration of generative AI and large language models (LLMs), which serve as force multipliers. These technologies empower organizations to navigate the complexities of modern IT environments more effectively. AIOps extends its impact beyond traditional IT boundaries to encompass overall business operations by enabling businesses to query the IT organization and receive contextually relevant responses.
The journey to AIOps begins with monitoring, progresses through observability, and culminates in intelligence (see Figure 1). Monitoring has become a staple in ITOps, providing visibility into devices, applications, and infrastructure. Observability takes this a step further by consolidating data to derive meaningful insights, predict future states, and automatically remediate known issues.
Figure 1. From Monitoring to Intelligence
Intelligence represents the pinnacle of this evolution, reflecting the operational state of the entire company. It leverages comprehensive data from various departments—marketing, sales, legal, human resources, and manufacturing—to deliver on the promise of AIOps. This holistic approach allows organizations to answer critical questions about the company’s status, predict the impact of business initiatives, and adapt to planned and unplanned changes.
With the rise of cyberthreats, integrating security and compliance features within AIOps platforms has become paramount. Real-time threat detection and response, facilitated by SIEM and SOAR, are now integral to operational management, ensuring robust security measures.
The complexity of IT systems, characterized by multicloud infrastructures and microservices, necessitates advanced AIOps solutions to manage vast data volumes. However, seamless integration with existing IT tools and systems remains a challenge. Vendors that offer solutions requiring minimal customization and integration effort, as well as those that lower the expertise threshold in AI/ML and data analytics, will likely see greater adoption.
As we delve into the landscape of AIOps vendors, we will explore how they address these challenges and evaluate their capabilities in transforming IT operations.
This is our fifth year evaluating the AIOps space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last year.
This GigaOm Radar report examines 29 of the top AIOps solutions and compares offerings against the capabilities (table stakes, key features, and emerging features) and nonfunctional requirements (business criteria) outlined in the companion Key Criteria report. Together, these reports provide an overview of the market, identify leading AIOps offerings, and help decision-makers evaluate these solutions so they can make a more informed investment decision.
GIGAOM KEY CRITERIA AND RADAR REPORTS
The GigaOm Key Criteria report provides a detailed decision framework for IT and executive leadership assessing enterprise technologies. Each report defines relevant functional and nonfunctional aspects of solutions in a sector. The Key Criteria report informs the GigaOm Radar report, which provides a forward-looking assessment of vendor solutions in the sector.
2. Market Categories and Deployment Types
To help prospective customers find the best fit for their use case and business requirements, we assess how well AIOps solutions are designed to serve specific target markets and deployment models (Table 1).
For this report, we recognize the following market segments:
- Small-to-medium business (SMB): In this category, we assess solutions on their ability to meet the needs of small and medium-sized organizations. Also assessed are departmental use cases in large enterprises where ease of use and deployment are more important than extensive management functionality, data mobility, and feature set.
- Large enterprise: Here, offerings are assessed on their ability to support large and business-critical projects. Optimal solutions in this category strongly focus on flexibility, performance, data services, and features to improve security and data protection. Scalability is another big differentiator, as is the ability to deploy the same service in different environments.
- Service provider: These providers deliver on-demand, pay-per-use services to customers over the internet, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).
In addition, we recognize the following deployment models:
- Cloud: The solution is hosted and managed by the vendor in the cloud, providing scalability without requiring local infrastructure.
- On-premises: These solutions are deployed on customer-owned infrastructure and managed by the customer.
- Hybrid: These solutions are meant to be installed on-premises and in the cloud, allowing organizations to build hybrid or multicloud infrastructures. Integration with a single cloud provider could be limited compared to the other options and more complex to deploy and manage. On the other hand, this approach is more flexible, and the user usually has more control over the entire stack regarding resource allocation and tuning.
Table 1. Vendor Positioning: Target Market and Deployment Model
Vendor Positioning: Target Market and Deployment Model
Target Market |
Deployment Model |
|||||
---|---|---|---|---|---|---|
Vendor |
SMB | Large Enterprise | Service Provider | SaaS | On-Premises | Hybrid |
BigPanda | ||||||
BMC | ||||||
Broadcom | ||||||
Centerity | ||||||
CloudFabrix | ||||||
Datadog | ||||||
Dell Technologies | ||||||
Digitate | ||||||
Dynatrace | ||||||
Elastic | ||||||
Evolven | ||||||
Grokstream | ||||||
HCL | ||||||
IBM | ||||||
Interlink | ||||||
ITRS | ||||||
LogicMonitor | ||||||
Logz.io | ||||||
MeshIQ | ||||||
NewRelic | ||||||
OpenText | ||||||
PagerDuty | ||||||
Riverbed | ||||||
ScienceLogic | ||||||
ServiceNow | ||||||
Splunk, a Cisco Company | ||||||
Sumo Logic | ||||||
Zenoss | ||||||
ZIF |
Table 1 components are evaluated in a binary yes/no manner and do not factor into a vendor’s designation as a Leader, Challenger, or Entrant on the Radar chart (Figure 1).
“Target market” reflects which use cases each solution is recommended for, not simply whether that group can use it. For example, if an SMB could use a solution but doing so would be cost-prohibitive, that solution would be rated “no” for SMBs.
3. Decision Criteria Comparison
All solutions included in this Radar report meet the following table stakes—capabilities widely adopted and well implemented in the sector:
- Real-time monitoring
- Predictive capabilities
- Root cause analysis
- IT operations integrations
- Security and compliance
- Authorization, access, and authentication
- Visualizations and dashboards
Tables 2, 3, and 4 summarize how each vendor in this research performs in the areas we consider differentiating and critical in this sector. The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the relevant market space, and gauge the potential impact on the business.
- Key features differentiate solutions, highlighting the primary criteria for evaluating an AIOps solution.
- Emerging features show how well each vendor implements capabilities that are not yet mainstream but are expected to become more widespread and compelling within the next 12 to 18 months.
- Business criteria provide insight into the nonfunctional requirements that factor into a purchase decision and determine a solution’s impact on an organization.
These decision criteria are summarized below. The corresponding report, “GigaOm Key Criteria for Evaluating AIOps Solutions,” provides more detailed descriptions.
Key Features
- Data aggregation and normalization: This feature refers to a solution’s ability to ingest, aggregate, and normalize data from any source anywhere with minimal effort. The use of a low-code/no-code ingestion method is ideal. The solution should ensure compatibility with various data sources, including cloud and on-premises infrastructure, applications, and various monitoring tools.
- Advanced analytics: Advanced analytics with anomaly detection systems identify and analyze deviations from normal behavior to provide a foundation for predictive insights and proactive management. Algorithms must be continuously trained, and ML models must be fine-tuned on diverse datasets to ensure the accuracy and relevance of analytics, adapting to changes in the IT environment and operational patterns.
- Anomaly detection: Anomaly detection uses real-time processing algorithms to recognize complex patterns and subtle anomalies that may indicate emerging issues or optimization opportunities. The solution should allow the customization of detection thresholds and parameters to accommodate the particular characteristics and requirements of different IT environments and applications.
- Correlation and causality analysis: A key to the success of any AIOps solution is multidimensional correlation, which analyzes data across multiple dimensions, such as time, geography, and infrastructure layers, to accurately identify correlations and causal relationships between disparate events.
- Automated remediation: Orchestration and automation can resolve incidents automatically, whether with a native engine or integration with an external tool. Three types of automated remediation are expected: policy-based automation, which implements frameworks that allow IT teams to define rules and conditions under which remediation actions are triggered; actionable remediation, which ensures remediation actions are precise, can be implemented, and are tailored to the specific issue, minimizing the risk of unintended consequences and enabling quick resolution of problems; and remediation that includes feedback loops that monitor the outcome of automated remediation actions, allowing the system to learn from successes and failures and continuously improve its effectiveness.
- Collaboration and workflow integration: AIOps tools should seamlessly integrate with existing IT service management (ITSM), DevOps tools, and collaboration platforms (like Teams and Slack), allowing efficient workflow management and information-sharing across teams. Automated triggers should enable the initiation of workflows based on insights and anomalies detected by the AIOps platform, streamlining the response process and reducing manual intervention.
- SIEM and SOAR integration: SIEM and SOAR solutions play crucial roles in the broader IT security landscape, and their relevance to AIOps is increasingly significant. Integrating SIEM and SOAR within AIOps platforms enables organizations to leverage advanced analytics, ML, and automation not just for operational efficiency and reliability but also for detecting, analyzing, and responding to security threats in real time.
Table 2. Key Features Comparison
Key Features Comparison
Exceptional | |
Superior | |
Capable | |
Limited | |
Poor | |
Not Applicable |
Emerging Features
- Edge AI: Edge AI enables real-time, localized decision-making and analytics at the network’s edge, reduces latency and bandwidth use, and improves data privacy and security. This emerging technology focuses on developing lightweight, efficient AI models and hardware capable of processing complex computations on edge devices.
- Automated causal inference: Advancements in ML algorithms for better causal analysis and the integration of explainable AI can make insights actionable and understandable for IT operators. This shift from correlation-based insights to understanding cause and effect relationships enables precise problem diagnosis and effective, proactive IT operations management.
- Predictive security posture management: Predictive security posture elevates cybersecurity within IT operations by using AI to predict and prevent security threats before they occur, aligning operational resilience with security. PSPM leverages advanced ML for anomaly detection and integrates real-time threat intelligence for dynamic risk assessment and mitigation strategies.
- Business data integration: AIOps presents an opportunity to combine IT operations with business objectives by incorporating business metrics (sales, customer behavior, and more) into operational decision-making, enhancing the strategic impact of IT operations. AIOps can process and analyze IT and business data in a unified manner, supported by AI-driven insights for cross-functional optimization.
- Generative AI: Generative AI (GenAI) refers to a class of artificial intelligence models that can generate new content, such as text, images, music, or even videos, that is often indistinguishable from content created by humans. These models are typically based on deep learning architectures, such as generative adversarial networks (GANs), variational autoencoders (VAEs), and LLMs like GPT (generative pretrained transformer).
- Ghost change detection: The most common cause of incidents is a change in the hardware or software environment. Change management captures changes made within known processes; however, changes made outside of approved processes—ghost changes—contribute equally to incidents. Without ghost change detection, the business continues to be at risk of interruption.
Table 3. Emerging Features Comparison
Emerging Features Comparison
Exceptional | |
Superior | |
Capable | |
Limited | |
Poor | |
Not Applicable |
Business Criteria
- Ease of deployment: Ease of deployment refers to the simplicity and speed with which an AIOps solution can be implemented within an organization’s IT infrastructure. This includes integrating with current systems, migrating data, and configuring the solution to meet specific operational requirements without significant downtime or disruptions. A solution that is easy to deploy reduces the time and resources required to get it up and running, thereby accelerating the realization of benefits from the investment.
- Ease of use: Ease of use refers to the user-friendliness of the AIOps platform, including its interface design, intuitiveness, and the simplicity of managing its operations. It also encompasses the learning curve for IT staff and the availability of support and training resources. Easy-to-use solutions can enhance productivity, reduce the likelihood of user errors, and decrease the need for extensive training, making it easier for teams to adopt and leverage the platform effectively.
- Flexibility: This criterion refers to the capability of an AIOps solution to quickly adapt to changes in the IT environment, including scaling to accommodate growth, integrating with new technologies, and updating to include new features or address emerging challenges. Flexibility ensures that the AIOps platform can continue to meet an organization’s needs over time, supporting continuous improvement and innovation in IT operations.
- Vendor ecosystem: A vendor’s ecosystem encompasses the network of partnerships, integrations, and third-party solutions available with the AIOps platform. This includes the ability of the solution to integrate with the organization’s existing IT management tools, cloud services, and other technologies, along with training, certifications, community tools, and professional services. A robust vendor ecosystem extends the functionality of the AIOps solution, facilitating seamless data exchange and operational workflows across a diverse IT landscape, thereby enhancing the overall value and effectiveness of the platform.
- Cost: Cost management in AIOps involves evaluating the expenses associated with core features, scaling, and integrations. Prospective customers should consider feature costs, including any necessary modules or add-ons; evaluate scalability in terms of how the solution accommodates increased data and user demands without significant additional costs; and review integration, customization, and training and support costs as they affect the overall operational budget.
- Compliance management: Compliance auditing and reporting facilitate the process of audits and generate reports to demonstrate compliance with relevant regulations and standards, such as SOC 2, HIPAA, or ISO 27001. This management capability includes tools for flagging and alerting violations, creating, managing, and enforcing security policies across the organization, and ensuring consistent adherence to security best practices and regulatory requirements.
Table 4. Business Criteria Comparison
Business Criteria Comparison
Exceptional | |
Superior | |
Capable | |
Limited | |
Poor | |
Not Applicable |
4. GigaOm Radar
The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—balancing Maturity versus Innovation and Feature Play versus Platform Play—while providing an arrowhead that projects each solution’s evolution over the coming 12 to 18 months.
Figure 2. GigaOm Radar for AIOps
In the Radar chart in Figure 2, vendors are positioned across quadrants based on the following:
Maturity/Feature Play
Vendors in this quadrant have a specific strength or focus in their approach to the AIOps market and the end-user organizations they service. Specific focus areas include detection, analysis, and remediation of issues captured by the AIOps tools as well as AIOps solutions tailored to drive specific value to verticals like banking or telecom organizations. Given the large challenge that organizations face in ensuring that AIOps tools can fully comprehend all data sources in an organization, this specificity can drive value to end users.
Maturity/Platform Play
As might be expected given the established nature of the AIOps market, there is a strong concentration of vendors with broad approaches and highly developed solutions with comprehensive feature sets in this quadrant. These vendors are making incremental improvements to their solutions, which largely includes adding sophistication to the AI models, which in turn drives the effectiveness of most key features we evaluated. Vendors in this quadrant are less focused on emerging features, and are more likely to wait for demonstrated market value of the newer technologies before investing in these technologies.
Innovation/Platform Play
Vendors in this quadrant offer broad solutions with comprehensive feature sets, but there’s likely to be moderate to significant change in the solution over a typical contract lifecycle. These vendors are prioritizing addressing the emerging needs of their users and have aggressive roadmaps to prove it. Vendors in this quadrant are mostly focused on developing the emerging features we evaluated, with notable concentration in GenAI capabilities and business data integration.
Innovation/Feature Play
This quadrant houses vendors that have more unique approaches to AIOps and are targeting emerging use cases such as edge AI. These vendors are flexible and responsive to user needs: they offer solutions that are in active development and still evolving.
In reviewing solutions, it’s important to remember that there are no universal “best” or “worst” offerings; every solution has aspects that might make it a better or worse fit for specific customer requirements. Prospective customers should consider their current and future needs when comparing solutions and vendor roadmaps.
INSIDE THE GIGAOM RADAR
To create the GigaOm Radar graphic, key features, emerging features, and business criteria are scored and weighted. Key features and business criteria receive the highest weighting and have the most impact on vendor positioning on the Radar graphic. Emerging features receive a lower weighting and have a lower impact on vendor positioning on the Radar graphic. The resulting chart is a forward-looking perspective on all the vendors in this report, based on their products’ technical capabilities and roadmaps.
Note that the Radar is technology-focused, and business considerations such as vendor market share, customer share, spend, recency or longevity in the market, and so on are not considered in our evaluations. As such, these factors do not impact scoring and positioning on the Radar graphic.
For more information, please visit our Methodology.
5. Solution Insights
BigPanda: AIOps
Solution Overview
BigPanda’s AIOps solution is an event management platform designed to help organizations identify and resolve common and complex IT issues with an ITSM framework. Its Automated Incident Analysis (AIA), which is an add-on feature, is powered by GenAI. It uses an LLM that enables natural language interaction to enhance incident response, find the probable root cause of an incident, identify other abilities, and speed up the time to remediation. The AI capabilities are also used to create short and meaningful incident titles and summaries, which are shown on dashboards and can be used to display additional information.
BigPanda AIOps is also an algorithmic event correlation platform that can ingest any type of data from any source, including business alerts, using webhooks and their API. Business events and alerts may require initial tagging or other correlation data to provide meaningful insights. The native automation engine allows remediation, collaboration with external tools, and workflow optimization.
Strengths
BigPanda is data agnostic, meaning it does not require displacing existing tools. Moreover, data-agnostic solutions are very good at anomaly detection, and the ability to handle data from any vendor or business source fosters the development of overall intelligence. The built-in automation engine facilitates the automated remediation of incidents and streamlines workflow between BigPanda and external tools such as Teams, Slack, or an ITSM system. The workflow interface enables automation and workflows to be configured and modified easily.
BigPanda is also easy to deploy and use. It’s a flexible solution with a strong vendor ecosystem. Administrators and users see a simple and consistent interface requiring little training. Flexibility is enhanced by the easy integration with any data set and the ability to run BigPanda as a SaaS application. BigPanda does a good job of compliance management but can’t send alerts on compliance violations.
Challenges
BigPanda has lower scores in SIEM and SOAR integration–though it can integrate with SIEM and SOAR solutions, it provides no direct support for them. Moreover, it has not yet incorporated some of the emerging technologies we noted above, including edge AI and predictive security posture management, which have yet to solidify in the industry. At the same time, BigPanda’s integration of business data and GenAI is keeping up with the marketplace and it could actually exceed the market in GenAI if more concrete definitions of average were available.
Detection of ghost changes is also lacking. Changes to infrastructure and topology can be seen, but configuration and code changes can’t. Therefore, the solution can’t see the system and software parameters or any configuration changes.
Purchase Considerations
BigPanda is deployed as a SaaS application> It comes with regional support for North America and Europe and provides full GDPR compliance.
BigPanda is a strong player in data-agnostic AIOps, where the focus is on pulling alerts, events, and other information to provide an intelligent overview of the enterprise. If observability tooling already exists, BigPanda should be a consideration. The enterprise needs only to send data to BigPanda without changing how development teams operate. Additional tagging helps, but most observability tools already provide these for use by DevOps and others. BigPanda makes intelligence about the enterprise more accessible.
BigPanda is a general-purpose AIOps solution applicable to multiple use cases such as network operations centers, ITSM incident problem management, and observability. The SaaS implementation is well suited to organizations of any size. The target market for BigPanda includes SMBs, ISVs, large enterprises, and service providers.
Radar Chart Overview
BigPanda is positioned in the Maturity/Platform Play quadrant. It is a domain- and data-agnostic tool that does not require tool displacement, and it has broad use case applicability. BigPanda prioritizes stability and continuity over breakneck advancement. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
BMC: Helix IT Operations Management Suite (BMC Helix ITOM)
Solution Overview
BMC Helix IT Operations Management Suite is an integrated AIOps and observability solution with ML/AI-powered discovery, monitoring, optimization, automation, and native remediation capabilities. BMC Helix ITOM Suite offers embedded causal, predictive, and GenAI. In addition to AIOps, the built-in observability tooling enables the direct generation of metrics, events, traces, topology, and log data, and the solution also has an open framework for ingesting metrics, events, topology, and logs from other sources and storing them in the platform data lake.
BMC Helix ITOM offers continuous discovery for infrastructure assets. Many integrations use preconfigured connectors and automated data reconciliation from third-party monitoring and ITSM tools. Its LLM is trained on customer data, and there are domain-specific and tenant-specific models that can use customer metadata.
BMC Helix ITOM is deployable as a data-agnostic tool, ingesting metrics, events, logs, traces, and alerts from other systems and natively as well. It can also serve as a data-centric solution, with particular strength in its AI implementation and native remediation engine.
Strengths
BMC Helix ITOM scored high on most of our decision criteria. Notably, it scored a 5/5 in advanced analytics, with strong capabilities demonstrated in root cause and predictive analytics and the ability to support multidimensional analysis.
The solution also scored a 5/5 in automated remediation functionality with policy-based automation and actionable remediation driven by AL/ML algorithms, and an AI/ML-driven feedback loop for improving existing automation or implementing additional automation capabilities.
The platform scored a 4/5 for anomaly detection, providing both advanced anomaly detection (univariate) and multivariate anomaly detection, capabilities that can strongly mitigate the issues of setting thresholds in unique and complex IT environments.
BMC uses AI to generate automated causality—cause-and-effect relationships—which can lead to insights before an anomaly is detected.
Business criteria with better-than-average ratings include deployment ease, ease of use, flexibility, and vendor ecosystem.
Startup and administration information is easily available. Day-to-day usage capabilities include low-code customization of most features and a good workflow manager. Together, they give BMC Helix ITOM a better-than-average ease of use rating.
Flexibility is strong due to the ability to configure the system in a manner that best suits the enterprise. Additionally, integration with third-party data is handled well with preconfigured integrations and a low-code interface for adding additional sources.
BMC has an excellent ecosystem with training, documentation, partners, community, certifications, and professional services.
Challenges
BMC Helix ITOM does not support SIEM or SOAR, though it can integrate data from either. Therefore, it also scores poorly on predictive security posture management. In addition, BMC does not have a direct method to see ghost changes. Changes to infrastructure and topology can be seen, but configuration and parameter changes cannot. And while the company has the standard alphabet of compliance certifications, it can’t send alerts on compliance violations.
Purchase Considerations
The BMC Helix ITOM suite is available as a SaaS solution and as a self-managed on-premises deployment running on containers. Both use the same agent technology: BMC Helix Monitor Agent, a preconfigured containerized collector, and both deployments have feature parity. SasS support includes many regions: U.S. FedRAMP/DoD, Americas, Canada, Europe, Asia, UK, South Africa, United Arab Emirates (UAE), and Saudi Arabia.
BMC Helix ITOM can also use BMC’s observability tooling and work as a data-centric solution, allowing customers to displace or allow existing tools to feed the solution. Note that existing tools may not expose all assets, while BMC products may consume more resources but provide better analytics and automation capabilities.
The solution’s cost model is asset-based, which impacts the choices when it is being deployed due to dependencies on the assets that will be under management of the platform.
BMC HelixGPT can analyze past incidents and recommend the most likely resolution steps, as well as summarization and explainability, using HelixGPT for clusters of events.
BMC Helix ITOM is a general-purpose AIOps solution and its primary use cases center around IT operations. However, additional functionality is built on the platform’s generative AI foundation, and its strong AI capabilities allow AI-assisted troubleshooting, remediation, continuous integration, and discovery.
Radar Chart Overview
BMC is positioned in the Maturity/Platform Play quadrant. It is an Outperformer due to its rapid rate of development related to our assessed features over the last 12 months, with a strong roadmap for the next 12 to 18 months. It is a domain/data-agnostic general-purpose AIOps solution. BMC prioritizes stability and continuity over breakneck advancement; it offers broad functionality and use case support. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Broadcom: DX Operational Intelligence
Solution Overview
DX Operational Intelligence is an integrated AIOps and observability platform offering full-stack application monitoring in a Broadcom-hosted SaaS application. The solution can also be deployed in any public cloud using a HELM-chart-based deployment. A self-managed solution is deployed using a containerized architecture like OpenShift or Kubernetes. DX Operational Intelligence supports intelligent automation through manual and policy-based automation triggers. It also recommends using Automic Automation workflows, which are based on heuristic analysis. Broadcom leverages existing monitoring and provides its own, allowing a transition from an existing solution to a full Broadcom solution over time. Thus, tool displacement can be gradual, or Broadcom can replace all observability tools outright. Scalable out-of-the-box connectors and the third-party integration framework (RESTMon) normalize the ingested data.
Unified dashboarding and reporting dedicated to business and IT personas and technologies leverage entire datasets across all data types. The configurable dashboards integrate analytics from the AIOps data lake comprising performance metrics, topology/relationships, logs, traces, alarms, and events.
Broadcom has particular strength in automation, and as a data-centric platform player tool displacement is not mandatory and migration to the Broadcom solution can be done gradually.
Strengths
DX Operational Intelligence has several strong features. The solution includes advanced analytics with machine learning algorithms and data pipelines. The data pipelines are optimized for near real-time analysis of alarm correlation and root cause analysis using topology, anomaly detection, predictive capacity analytics, and automation recommendations.
Anomaly detection is available out of the box. The solution supports seasonality in the data and predicts normal behavior for the next eight hours. Anomaly alarms can be used for automated workflow actions like notifications, ticketing, and automation.
Broadcom excels in automation, including device attribute-based policies that can be defined to trigger automated alarm actions like assign, clear/close, escalate via email/ticket, hide/unhide, an automation workflow to collect additional triage information, or a remediation automation workflow.
Communications and workflow are better than most. Broadcom provides seamless integration with ITSM solutions, notifications tools, DevOps tools, and collaboration platforms like Slack, Microsoft Teams, or Zoom. Automated workflows are supported for ticketing, notifications, and alarm actions.
Broadcom stands out in ease of deployment, usage, and solution flexibility. Installation can be almost anywhere (Broadcom cloud, public cloud, or on-premises) and managed by Broadcom or the enterprise. Data collection requires a simple agent (collector) deployment. Integrations are plentiful and often preconfigured.
Challenges
Broadcom has one feature challenge—no support for SIEM or SOAR. SIEM and SOAR-specific capabilities—to view security logs in the context of a problem, define and save searches, alarm notifications, and metrics— are supported through logs for triage capabilities.
GenAI is currently lacking but is listed on the roadmap for 2024. Change detection is limited to change management and other process-related revisions. Changes to infrastructure and topology can be seen, but configuration and code changes cannot.
Purchase Considerations
Costs and licensing are typical for AIOps solutions. The solution is offered with subscription licenses and usage-based pricing. Both models establish a baseline number of licenses forecasted to be consumed based on the sizing of the environment.
Broadcom offers admins tools to contain costs. These include license calculations and usage reports. The reports also show the baseline license versus actual used license count validations.
Deployment is SaaS, managed by Broadcom, and the solution is also deployable as a self-managed system on other public clouds or on-premises. Training and support are very good as part of the Broadcom support ecosystem.
Broadcom’s AIOps solution is targeted at organizations of any size, and it’s a good all-around AIOps platform. It is especially suitable for use cases where strong automation and workflow are important.
Radar Chart Overview
Broadcom is positioned in the Maturity/Platform Play quadrant. It is a data-centric, general purpose solution. Broadcom prioritizes stability and continuity over breakneck advancement. It offers broad functionality and use case support. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Centerity: AIOps
Solution Overview
Centerity CSM² specializes in collecting and analyzing data from large and complex distributed edge environments, followed by extensive auto-remediation capabilities, asset management, and an exclusive real-time business service dashboard. The solution displaces existing edge monitoring and observability tools but can ingest data from other sources for a complete AIOps solution.
Strengths
Centerity CSM² has strengths in edge AI, and in correlation and causality analysis, thanks to the edge computing architecture used by Centerity. Edge devices may have multiple monitoring points, and the ability to correlate these is a plus for Centerity.
Causality analysis in edge environments, such as retail, can be tricky as the edge stack typically holds many devices and layers of technology. Such environments can make it difficult to determine cause and effect or the root cause of an incident. Centerity does well handling this difficult situation, with the ability to auto-remediate most incidents with no human intervention.
Complex edge environments are the key focus of the Centerity offering. Though this impacts deployment ease, no other vendor concentrates on AIOps at the edge. Centerity is often used by integrators in retail, hospitality, and healthcare, where the number of devices at the edge can be substantial. The solution is easy to use, with good visualization and customization of dashboards. Flexibility is strong, which is expected for an edge-focused solution where the deployment environment is often complex and geographically dispersed.
Challenges
Centerity can ingest data from multiple sources but does not have a native SIEM or SOAR component. It does support many security devices and can integrate with SIEM and SOAR systems.
Outside of Edge AI, Centerity support for the emerging features we assessed was limited or absent. While it does not provide automated causal inference, it does provide capabilities that support the identification of root causes of issue through anomaly detection and event management automation. Similarly, predictive security posture is an emerging feature that is not yet provided. Both features are important in edge environments where the public can access physical devices.
Changes to infrastructure and topology can be seen, but configuration and code changes cannot, leading to a lower score for the ghost change criterion. Available information does not indicate the ability to observe configuration of edge devices (IoT or otherwise). When Centerity adds this feature it will significantly improve the product. Management of compliance policies is limited to view logs and audit files.
The support ecosystem lacks easily available documentation that allows users to investigate the solution’s suitability without engaging sales or professional services. The same is true for the licensing model; no information is publicly available.
Purchase Considerations
The configuration and deployment of an edge AIOps solution is not trivial; however, Centerity and its global partners provide professional services to support deployment. Training is available as part of deployment options, as are on-going workshops and round tables.
The platform can be delivered as an on-premises, cloud, or hybrid deployment. Licensing model information is unavailable; however, it is likely to use both device and consumption-based cost models, as is common for AIOps solutions.
Use cases for Centerity CSM² include retail and hospitality, financial services, healthcare, public safety, defense, manufacturing, and energy and utilities. These are situations where multiple devices are deployed at the edge for consumption by the public or the enterprise. The company is unique in its support of edge AI and the use cases related to pushing compute platforms and AI/ML to the edge.
Radar Chart Overview
Centerity is positioned in the Innovation/Feature Play quadrant as it’s focused on specific functionality and use cases. The vendor is flexible and responsive to the market, and it’s likely that the solution will look and feel different year over year. It has a lower aggregate score in our decision criteria, making it an Entrant in this report.
CloudFabrix: cfxCloud Data-Centric AIOps Platform
Solution Overview
CloudFabrix cfxDimensions is a general-purpose domain-agnostic AIOps solution able to ingest data from multiple cross-domains, including applications, servers, VMs, networks, storage, infrastructure, and cloud. It is a single integrated platform that supports three deployment modes: self-managed on-premises, public or private cloud, and CloudFabrix-managed SaaS. CloudFabrix is known for its Robotic Data Automation Fabric (RDAF). The RDAF DataOps Solution Pack automates repetitive data preparation and integration activities, including metadata discovery, data quality analysis, data ingestion, data filtering/cleaning/transformation, data shaping/aggregation, data encryption/decryption, and data tracing.
CloudFabrix includes the ability to build utility AIOps/observability services that run alongside the core platform. The platform manages all observability data, routing it to a low-cost observability lake, log stores, and metrics stores for out-of-the-box integrated services. The solution also provides a data lake for persisting all the observability data within the platform. Data can also be forwarded to CloudFabrix’s composable dashboards. The solution facilitates AIOps, asset intelligence (FinOps), and log intelligence with support for vertical solutions in telecommunications, service assurance, and SAP observability. CloudFabrix offers a data fabric for GenAI, providing AutoML automation data enrichment and contextualization, conversational queries, AI dashboards, and AI for intent-driven automation.
Low-code bots from RDAF are prebuilt for certain use cases, making it unnecessary to add or maintain any code in the field. The bots can be assembled in a pipeline to deliver use-case-specific logic. The solution can integrate and ingest any observation and operational data types, including ITSM, OpenTelemetry, GNMI, SNMP traps, and MELT, and it can build entity models represented with a graph database.
Strengths
CloudFabrix scored high across most decision criteria assessed, including in advanced analytics due to its AutoML framework, which enables the vendors to provide a wide range of algorithms to end users. The platform has a built-in scheduler for continuous model training and tuning. Contextual analysis using local and remote LLMs supports correlation and causality using both RDAF and GenAI algorithms. CloudFabrix provides intent-based automation, where conversational queries can trigger automation with Camunda, Ansible, or other orchestration tools. RDAF enhances collaboration and workflow and can provide bidirectional data exchange with third-party tools, and it can be used to create easy-to-manage data pipelines for data aggregation and normalization. CloudFabrix’s RDAF and AI capabilities enable the company to score better than average on key features for AIOps solutions.
RDAF and data pipelines allow CloudFabrix to compete with other vendors in the emerging technology of automated causal inference. Pipelines inherently show cause-and-effect relationships. RDAF and over 1,000 preconfigured bots allow deployment in any supported environment, contributing to above-average flexibility.
CloudFabrix’s composable dashboards and natural language capabilities allow users to gather information or create dashboards without technical knowledge via a GenAI assistant called Macaw. It enables conversational queries mapped to intents and can generate dashboards from prompts. By using natural language as the only interface to the solution, CloudFabrix is moving closer to the target of AIOps without requiring users to have sophisticated technical skills.
Challenges
CloudFabrix supports integration with external SIEM and SOAR solutions, with bi-directional functionality that is limited on integrations with legacy systems and may require customization due to its low-code architecture. Security data can be ingested, however, and the solution’s predictive capabilities will likely support predictive security posture management and security context enrichment in the future. Ghost changes are limited; changes to infrastructure and topology can be seen, but configuration and code changes cannot due to lack of integrations in CI/CD pipelines. Limited support is provided for compliance management: audit trails and logs can be used to do some reporting.
Purchase Considerations
Licensing options allow customers to choose between user-based, subscription (monthly or yearly), perpetual, or usage-based models, depending on requirements and deployment preferences (cloud versus on-premises). CloudFabrix’s RDAF may be compelling for enterprises with disparate integration needs and data translation capabilities. CloudFabrix does not require significant training or professional services, but time should be spent planning before deploying the solution.
CloudFabrix provides vertical support within the telecommunication industry. Support for SAP users is via an SAP Observability add-on.
Radar Chart Overview
CloudFabrix is positioned in the Maturity/Feature Play quadrant. It’s a more balanced AIOps solution supporting a variety of verticals in telecom, healthcare, automotive, financial services, and managed service providers. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Datadog: Watchdog
Solution Overview
Watchdog powers Datadog’s AIOps capabilities, comprising Datadog Watchdog, Event Management, Incident Management, Workflows, App Builder, and beta features like Bits AI. It provides the managed features and algorithms that plug into existing Datadog telemetry to automate existing operations and workflows. Watchdog is Datadog’s engine that enables analysis, anomaly detection, forecasting, outlier recognition, data correlation, and dependency mapping and root cause analysis. These capabilities can be used with existing products to augment features. Users can opt in to automated anomaly detection or forecasting or run these with a custom configuration. Watchdog obtains context (by running a query or looking at the telemetry of a given service), and it can then surface relevant insights. Datadog Watchdog handles data from all 22 Datadog offerings, supporting a self-assembly approach to monitoring, observability, and AIOps. Tool displacement is not required; however, when using Datadog modules, any existing monitoring or observability tools can be replaced. Depending on the Datadog modules selected, SIEM and out-of-the-box compliance frameworks are available.
Strengths
Datadog scored a 5/5 in data aggregation and normalization, excelling in data ingestion for cloud, on-premises, and hybrid deployments. It automatically tags and organizes data into consistent views, with real-time pipelines that enrich and normalize the data. Its advanced analytics include predictive capabilities for forecasting issues and supporting multidimensional analysis. Datadog’s AIOps uses ML for strong anomaly detection, identifying deviations and providing insights based on trends and historical data. Remediation workflows can be created in a low-code graphical environment, with automated remediation supported through integrations. The platform’s dashboards offer detailed visualizations in a low-code setting, including graphical workflow and chatbot visualization features. It also boasts strong collaboration tools, integrates with Slack and Microsoft Teams, and supports bidirectional integrations with external systems.
Datadog excels in ease of deployment, ease of use, solution flexibility, and vendor ecosystem. It deploys via SaaS, with a simple initial setup, sending data to the appropriate regional instance through agents or API integrations. Administration and daily use are straightforward, thanks to its low-code environment and graphical workflow management.
Over 20 enterprise modules enhance Datadog’s flexibility, allowing easy addition or replacement of tools and functions. Its vendor support ecosystem is excellent, with strong support for partnerships, integrations, third-party solutions, training, certifications, community tools, and professional services.
Challenges
Datadog does not feature edge AI as part of its standard AIOps solution. However, data ingestion from agents at the edge is possible, and support of IoT devices is an add-on.
Datadog’s AIOps suite provides correlation and causality analysis tools that help infer relationships between metrics and logs, but may not distinguish root-cause analysis from cause and effect. Specific compliance capabilities might require additional configuration or modules, or integration with other tools.
Purchase Considerations
Datadog is a data-agnostic solution that offers a variety of modules. While using Datadog monitoring and other observability tooling can create a more seamless environment, it is not required to take advantage of Watchdog’s AI abilities. Datadog has a wide variety of features and modules, which can make designing a solution more complicated. It is deployed as a SaaS solution.
Datadog has a tiered licensing model with a free entry point and with allocation and consumption costs depending on the module, although Watchdog and its AI features are available only in the enterprise version.
The embedded AI services provide a comprehensive array of capabilities for multiple use cases. Datadog targets organizations of any size.
Radar Chart Overview
Datadog is positioned in the Maturity/Platform Play quadrant. Datadog is a general-purpose solution that’s widely applicable across many use cases. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Dell Technologies: Dell APEX AIOps
Solution Overview
Dell’s data-centric APEX AIOps solution uses multiple AI/ML techniques to rapidly process and correlate events and telemetry across the enterprise to identify and remediate service issues. Dell APEX AIOps consists of three products: Dell APEX AIOps Incident Management (formerly the Moogsoft platform); Dell APEX AIOps Infrastructure Observability (formerly the CloudIQ product, which proactively monitors and measures the overall health of Dell infrastructure through intelligent, comprehensive, and predictive analytics); and Dell APEX AIOps Application Observability (in a partnership with IBM for the Instana product). Dell APEX AIOps can be deployed as a virtual appliance using a VM Image agent, a public cloud image on AWS, a software-only self-managed solution with an optional software collector (Linux or Windows) on-premises, and a SaaS application. The solution is targeted broadly at any enterprise or service provider. Dell Incident Management on-premises solutions continue to be supported and will stay vendor agnostic. Infrastructure Observability (formerly CloudIQ) is Dell specific.
Strengths
Dell scored a 4/5 on advanced analytics, which leverages the strong capabilities gained through the acquisition of Moogsoft and builds on the AI/ML strengths of previous incarnations of the data-agnostic solution. Dell’s offering includes data-centric observability from APEX AIOps Application Observability and Dell’s cloud management software APEX AIOps Infrastructure Observability, which also includes capacity and anomaly detection, and issue prediction capabilities. and strengths in advanced analytics and anomaly detection from Moogsoft.
Dell scored a 4/5 on SIEM and SOAR integration with the ability to ingest data from SIEM applications and integrate fully with SOAR solutions. APEX AIOps can proactively scan for the most significant cybersecurity risks per each Dell infrastructure product, as identified by the security subject matter experts from each Dell product team. They leverage AI/ML to match known Dell Security Advisories (DSAs) to customers’ infrastructure to understand which DSAs apply to the products in any given customer’s environment.
Entropy and vertex entropy, part of the advanced analytics and anomaly detection algorithms, are recalculated as the underlying data changes. No user intervention is required. The probable root cause results are a semi-supervised ML neural network that can be operator-reinforced.
For capacity predictions, a seasonality algorithm provides insights about capacity status starting 90 days after implementation. For performance forecasting, the common forecasting service predicts when KPIs are expected to reach 100%.
APEX AIOps has native support for time-series data and uses an “adaptive” algorithm to determine normal behavior baselines and detect anomalies. Events derived from metrics are correlated with other asynchronous events.
The Similar Situations function offers predictive capability, which becomes more accurate as an incident evolves. Situation Rooms, a Moogsoft signature feature, continues to enable major incident management.
Once set up and running, the solution is easy to use, with a consistent interface customizable for specific actors or teams. Using Situation Rooms and integrating the collaboration tools APEX supports provides ease of use, particularly during major incidents.
Challenges
Dell scored a 3/5 on correlation and causality analysis, and while the solution possesses strong root-cause analysis as part of APEX AIOps, it is unable to automate cause-and-effect determinations. Probable root cause provides and builds an operator-trained model for causality in the absence of existing models.
Dell does not directly support the detection of ghost changes in software and hardware outside of established change processes. Changes to infrastructure and topology can be seen, but configuration and code change visibility is limited. By customizing the solution, support for identifying changes at the file level can be provided, but no support for parameter or code changes is available.
The solutions conform to SOC 2, GDPR, and Cloud Security Alliance STAR level one, but no support exists for managing and alerting compliance violations.
Purchase Considerations
Dell developed its APEX AIOps portfolio after it acquired Moogsoft. The new offering combines Dell’s AIOps Incident Management with Infrastructure Observability and Application Observability to offer observability and management with Dell infrastructure and also other public, hybrid and on-premises infrastructure in the customers’ ecosystem. The requirements for training and professional services have yet to be defined. The licensing is host-based. Documentation for the current offering is not available. Its target market includes businesses of any size.
Radar Chart Overview
Dell is positioned in the Maturity/Platform Play quadrant. It’s a data-agnostic and data-centric general-purpose solution that’s widely applicable across many use cases. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
Digitate: ignio AIOps
Solution Overview
Digitate’s ignio AIOps solution is a SaaS-based, domain-centric platform that can ingest data from any data source through integration APIs or webhooks. Digitate ignio uses a closed-loop approach to combine AI-based insights with observability across IT and business layers. It includes out-of-the-box automation and self-healing actions to prevent, detect, triage, and resolve issues autonomously. The target market is service providers and enterprises of any size. Digitate ignio covers multiple aspects of ITOM, ITSM, and monitoring within a single platform.
Digitate ignio also provides out-of-the-box capabilities for automated patching, compliance checks, provisioning, and health diagnostics. The solution uses intelligent automation to cover multiple IT towers, including traditional data centers, public and private cloud batch managers, end-user devices, and SAP systems.
Strengths
Digitate ignio scored high in advanced analytics and anomaly detection, using AI/ML algorithms to profile time-series data, including seasonality. Its algorithms calculate normal behavior across KPIs and metrics, highlighting anomalies and changes. Users can define customized models, with automatic or manual training occurring daily after new data ingestion. Models can be fine-tuned by adjusting parameters.
Digitate’s strong native automation engine employs a closed-loop approach with event-based, time-based, or on-demand triggers. Its excellent automated remediation features self-healing capabilities that dynamically generate dependency hierarchies for incidents, diagnose probable causes, resolve issues using built-in knowledge, validate closure, and close incidents. Digitate ignio also supports collaboration through channels like Teams or Slack.
ignio leverages model-based reasoning for causal inference, using situational data such as CMDBs, relationship data, inventory lists, and technology models or metamodels. As a SaaS application, deployment is straightforward, requiring collectors, agents, and data ingestion. Public documentation is limited, but self-paced tutorials and videos are available to get started. The ease of use is good, and the workflow management and auto-remediation interfaces are better than average. Digitate ignio offers robust analytics, anomaly detection, and automated remediation capabilities, making it a strong AIOps solution.
Challenges
Integration is possible with SIEM and SOAR tooling, but push-and-pull support is inconsistent. Some configuration changes can be detected, but finding parameter-level alterations outside change management processes is not possible. In some areas, such as self-paced introductory tutorials, the vendor ecosystem is very good, including partnerships, integrations, resellers, certifications, community tools, and professional services. Some compliance management abilities are present, but there is no alerting on compliance violations.
Purchase Considerations
Evaluation of Digitate before purchase requires a proof of concept and professional services consultation. The lack of public documentation will hinder comparison with other solutions to finalize a short list of vendors. Digitate ignio uses a subscription licensing model with pricing based on the number of nodes. Regional support for the SaaS deployment includes the Americas, Europe, the UK, and APAC.
Digitate ignio AIOps has broad appeal across all enterprise sizes, including service providers. Business health monitoring is built into the solution, along with business continuity. ignio provides vendor-agnostic visibility across multicloud environments with cost analytics and lifecycle automation. Common verticals for Digitate ignio include retail, life sciences and healthcare, manufacturing, transportation, and entertainment.
Radar Chart Overview
Digitate is positioned in the Innovation/Platform Play quadrant. It’s flexible and responsive to the market and is investing in emerging features in this space. It’s also broadly applicable to many use cases. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
Dynatrace
Solution Overview
Dynatrace provides out-of-the-box solutions for infrastructure observability, application observability, security protection, security analytics, digital experience monitoring, business analytics, and automation for operations, development, security, and business personas. The open platform can be extended to cover custom use cases via the Dynatrace AppEngine and the Dynatrace AutomationEngine.
Dynatrace is delivered as SaaS on AWS, Azure, and GCP with extensive worldwide regional support. Dynatrace also provides options to deploy the platform at the edge in customer-provisioned infrastructure with Dynatrace Managed. Managed customers run clusters in private and public clouds. The company pushes 25 functional releases annually for SaaS and 12 for Dynatrace Managed.
With Dynatrace’s OneAgent, a single agent is deployed once on a host and begins collecting all relevant metrics, traces, and logs across infrastructure and the full application-delivery chain. The solution can also ingest data from sources using webhooks and APIs. The Davis AI engine offers fact-based, predictive, and causal AI insights together with generative AI. Dynatrace Grail is a data lakehouse for logs, metrics, traces, events, and more. All data stored in Grail is interconnected within a real-time model that reflects the topology and dependencies within a monitored environment. Dynatrace AutomationEngine is a low-code, answer-driven automation technology that leverages Davis AI to power BizDevSecOps workflows intelligently. PurePath is a patented distributed tracing and code-level analysis technology that integrates distributed tracing with user experience data and data from open source technologies, including OpenTelemetry, OpenLLMetry, and code-level analytics.
Strengths
Dynatrace scored high across many of the criteria. Using OneAgent, OpenTelemetry, API ingestion, and the Davis AI engine gives Dynatrace advanced data aggregation and normalization and contextualization capabilities. Davis uses causal AI for rapid issue identification and resolution, predictive AI for forecasting and anomaly detection, and GenAI for productivity enhancements like natural language processing (NLP) and AI recommendations.
Using Dynatrace Managed enables the use of edge AI. The self-managed option gives Dynatrace a high-quality capability for organizations with devices and applications on the edge of the environment. Dynatrace Managed and SaaS provide a single interface across cloud and on-premises workloads.
AutomationEngine and its visual workflow front end offer an extensible no-code approach to workflow automation using Davis causal AI answers. Notebooks enable customers to create data-driven documents for custom analytics and use cases.
AutomationEngine and AppEngine go far beyond comparable “custom app” solutions. With its secure serverless auto-scaling runtime environment, Dynatrace enables customers to build apps for custom use cases while leveraging all data ingested into Dynatrace, including from Smartscape and Davis AI.
Challenges
Dynatrace can see many environmental changes, but parameter and configuration changes outside the change process are unavailable. Though Dynatrace supports compliance standards, the ability to alert compliance violations is missing, but the Davis AI may be able to do it in the future.
Dynatrace can see many environmental changes, but parameter and configuration changes will be available with the integration of security posture management capabilities acquired from Runecast, which are expected by the first half of 2025. Davis AI will support detecting and prioritizing potential violations of compliance standards, like CIS benchmarks and DORA.
Purchase Considerations
Dynatrace provides useful AIOps capabilities for any general use case. It has consistently improved the startup time and the learning curve for deployment and usage.
The Dynatrace Platform Subscription (DPS) offers customers comprehensive access to its capabilities with straightforward consumption-based pricing. Customers make an annual spend commitment and can use any capability in any quantity based on their day-to-day business and technical needs (no monthly or per-capability minimums, no per-user fees).
Dynatrace is well suited to large organizations and those with Edge AI needs. The fully extensible and programmable platform makes custom use cases easy to create.
Radar Chart Overview
Dynatrace is positioned in the Innovation/Platform Play quadrant. It’s flexible and responsive to the market and is investing in emerging features in this space. It’s also broadly applicable to many use cases. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Elastic: Elastic Search AI Platform
Solution Overview
Elastic Search AI Platform is built on the ELK (Elasticsearch, Logstash, and Kibana) stack and combines search functions and the intelligence of AI to build applications, proactively resolve observability issues, and address complex security threats. Elastic can be consumed using Elastic Cloud as a SaaS application. It is deployable on any cloud provider and can be self-managed on a public, private, or hybrid cloud. Tool displacement is not required, though Elastic does have observability tools. The solution is based on open source tools, with support and proprietary extensions from Elastic. Much of Elastic is built on open source software, and community versions are available.
Strengths
Elastic scored high in the key criterion of Data aggregation ingestion and normalization. The solution efficiently achieves this from various sources by combining Elasticsearch, Logstash, and Elastic Agent. It facilitates efficient data aggregation and normalization from various sources. Kibana, Elastic’s visualization tool, provides extensive capabilities for creating interactive dashboards that visualize complex datasets in real time. Out-of-the-box one-click integrations and dashboards facilitate speedy onboarding of new data sources. Another strength is the solution’s flexibility. The platform supports a wide range of data types and sources, offers extensive APIs for customization, and integrates well with numerous other tools and systems. This adaptability extends to deployment with options from self-managed to SaaS. Elastic benefits from a strong vendor ecosystem, with robust support for various integrations and a community contributing to its extensive plugin architecture. This ecosystem enhances the platform’s capabilities and integration potential, particularly with regard to generative AI using a chatbot, and with incorporating AI into the solution. Elastic’s generative AI Assistants for Observability and Security further benefit from retrieval augmented generation (RAG) based on its Elasticsearch Relevance Engine (ESRE) technology. The ESRE technology ensures highly accurate and relevant results based on integration of the organization’s own data, including runbook, ticketing systems, and operating procedures. Training and certifications are available, and professional service is provided when needed.
Challenges
Automation is not a strong feature of Elastic. Although Elastic can detect issues and support alerting mechanisms, automated remediation requires integration with other IT automation tools. However, Elastic’s Alerting framework can trigger workflows that integrate with external systems for remediation tasks. Its machine learning can perform anomaly detection and correlation analysis; however, causal inference requires custom implementation or integration with specialized tools.
Purchase Considerations
Elastic is best consumed as a public cloud-managed service and is available on a preferred cloud provider—AWS, Azure, or Google Cloud. Customers who want to manage the software themselves, whether on public, private, or hybrid cloud, can download the Elastic Stack; support is available.
Elastic is based on open source software, so enterprises familiar with it are more likely to implement the solution successfully. Elastic provides full enterprise-level support and professional services, which mitigates the enterprise’s need for expertise, except possibly for a self-managed installation. The SaaS application and agent and collector management (Beats) have less of an impact on enterprise resources.
Radar Chart Overview
Elastic is positioned in the Maturity/Feature Play quadrant. The rapid movement in AI reflects Elastic’s community involvement and the strong in-house team it has assembled, leading to an Outperformer classification on the Radar. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
Evolven: Configuration Risk Intelligence
Solution Overview
Evolven specializes in managing configuration and parameter changes, whether planned or unplanned, and providing the essential functions of an AIOps solution. Its ability to track changes with a granularity that extends to the parameter level of applications, files, repositories, databases, and code is unique in the AIOps space. Evolven can be deployed as a self-managed solution on public or private clouds, plus it is available as a SaaS solution with instance management by Evolven.
Evolven scans Windows registries, reads configuration files, scans database schemas, and retrieves master application data in database tables. It also executes system and application CLIs to retrieve configurations and artifacts, and it polls platform APIs to query K8 and cloud configurations to create a baseline of the environment. In addition to change risk analysis, Evolven tracks granular changes to baselines executed manually or automatically, planned or unauthorized. Evolven’s blended agent-based and agentless collection mechanisms for visibility require no tool displacement. The solution can ingest data from any existing tools.
Evolven serves a unique space within AIOps—it not only handles standard AIOps functions but can see and alert unauthorized changes and drift at any level of detail.
Strengths
Evolven is a data-agnostic solution, except for configuration and parameter data collected by its native engine. It excels in data ingestion and normalization, using AutoML for interactive analysis of data from various tools and the environment. The ability to analyze configuration changes down to the parameter level sets it apart. However, the lack of online documentation for non-customers is a drawback.
Evolven scored a 4/5 in advanced analytics with the ability to aggregate data from diverse IT systems, applying patented algorithms to uncover patterns, correlations, and anomalies to assess stability, compliance, and security risks. This enables proactive issue identification, rapid root cause triage, and potential optimization opportunities. Anomalies detected at the parameter level within the data ingestion stream earn Evolven an exceptional rating.
Evolven has a demonstrated focus on emerging features and leads the field in ghost change detection, identifying unauthorized changes not approved by change requests. It is the only AIOps tool that thoroughly handles this use case. It reconciles actual changes with approved tickets or other authorization sources and evaluates change risks before deployment. Drift detection and consistency analysis identify configuration drift and inconsistencies across IT environments. The solution’s “golden baseline” management ensures alignment between discovered assets and defined baselines.
Evolven scored high in flexibility, ingesting data from any source, deploying on-premises or as a SaaS application, and supporting complex environments with edge AI. It uses both agent and agentless technologies and a no-code approach for integrations. The EvoGPT chatbot provides insights in Slack and Teams, enabling natural language data interaction.
In compliance management, Evolven compares configurations against user-defined and industry standards, examines digital certificates, and uses vulnerability detection to match configurations against the National Vulnerability Database. Its blind spot detection capabilities identify configuration assets not covered by security or performance tools.
Challenges
Evolven’s SIEM integration with tools like Splunk and QRadar may require extensive customization, complicating real-time threat response. The SOAR integration aims to automate remediation through playbooks, but frequent manual oversight limits its effectiveness, reducing time savings and causing inconsistencies in response actions. Additionally, the lack of prebuilt playbooks places a heavy burden on security teams to develop their own. For GRC integrations, while Evolven can trigger corrective actions for compliance violations, its REST API is underutilized, revealing a gap between its automated remediation capabilities and real-world application. While no edge AI support exists, data can be collected from the edge using on-premises tools.
Purchase Considerations
Although customers typically purchase complete solutions, there is a case to be made for adding Evolven to existing deployments for its change-detection ability alone. Evolven is also a competent AIOps solution that doesn’t displace existing tools. The solution is licensed per operating system instance, with network devices and other devices typically licensed in a 5:1 instance-to-OS manner to reduce costs and align with perceived value.
Evolven targets cloud service providers, managed service providers, large enterprises, and the public sector but is less suitable for SMBs. The company’s technical users include IT Ops, DevOps, CloudOps, and SecOps teams. Within any of these, unauthorized change detection capabilities identify and highlight unplanned and undocumented changes at the service desk, within CI/CD pipeline processes, during automated deployments, or in other sources.
Radar Chart Overview
Evolven is positioned in the Innovation/Feature Play quadrant, as it’s focused on specific functionality and use cases. Its unique abilities suggest it could threaten more established AIOps players in the future. The vendor is flexible and responsive to the market, and it’s likely that the solution will look and feel different year over year. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
Grokstream: Grok AIOps
Solution Overview
Grok is an AIOps platform with a SaaS deployment model and on-premises option. Designed with a cognitive architecture rooted in research on reasoning and adaptability, Grok integrates neuroscience principles with machine learning to deliver autonomous (AI) operations with self-healing capabilities. It offers alarm and incident compression, proactive resolution, predictive analytics, and intelligent automation. It observes, interacts, and continuously learns from its environment by employing machine-learning models that react, learn, and adapt. Tool displacement is not required for this data-agnostic solution.
Grokstream has quarterly minor releases and at least one major release per year for associative clustering, reinforced classification, the Grok platform, GrokFix (automation), and Grok StreamToStream (data ingestion and transformation).
Strengths
Grok features a native data aggregation and normalization engine, Grok StreamToStream (STS), with a user-friendly workflow interface for flexibility. Grok STS allows users to merge and operationalize data from any source via Grok Collection Blend. Grok uses associative clustering across multiple third-party tools to perform correlation on common causes and employs a classification model to further analyze and refine causation. The training model looping builds more intelligence into predictive causality analysis.
Grok utilizes adaptive models for performance data, continuously analyzing real-time streams to detect variations and seasonal patterns and identifying relevant anomalies early. Its architecture offers algorithm flexibility and acts as an early warning system, providing timely alerts. Grok prioritizes responses by distinguishing between unusual anomalies and genuine threats.
Grok’s anomaly detection identifies abnormal infrastructure behavior, signaling potential failures or performance issues. Using dynamic thresholding and seasonality considerations, it adapts to changing conditions and detects abnormal patterns, enabling early warnings and proactive interventions. This approach reduces reliance on static management tools, allowing Grok to continuously learn and detect nuanced changes that may indicate issues before they escalate.
Grok also offers a remote collection layer for high-volume data from edge devices and lightweight anomaly detection models for processing time-series metrics, ensuring high performance. Its horizontally scalable architecture supports large environments and MSP settings, automatically learning and adapting to new or changed IT services without human intervention. Grok promotes customer innovation with its open-box AIOps approach, providing a data science toolkit for model experimentation and refinement.
Challenges
Grok does not focus on cybersecurity; however, the platform is designed to be data agnostic and can ingest SIEM and SOAR data. There is no ghost change support besides custom code to see configuration changes in the infrastructure ingested. There is no information on regional data centers for PRDC compliance or other local data requirements. Grok offers root cause identification as a feature of the classification layer; however, there is limited support for automated causal inference based on symptom and post-ticket close analysis. Grok conforms to all standard security compliance requirements but does not manage compliance standards or violations. There is no alerting for events surrounding compliance.
Purchase Considerations
Prospective customers should closely examine the licensing structure—which often follows a subscription-based model with costs that may scale with data volume and feature utilization—to avoid unexpected expenses as usage grows. Implementation can be complex, requiring professional services for integration with existing IT operations and monitoring tools, particularly in hybrid environments. Grok eases this challenge by offering template integration workflows for a range IT Ops systems, including ServiceNow, Big Panda, LogicMonitor, Netcool, and Spectrum.
Organizations should prepare for Grokstream’s configuration demands, which may include custom scripting and API integrations. They should also plan for comprehensive training sessions as the platform’s advanced AI features can have a steep learning curve which can impact initial deployment timelines and effectiveness.
Use cases for Grok are general and do not have an industry focus. They include self-healing ITOps, continuous service availability, predictive operations, and intelligent incident response.
Radar Chart Overview
Grokstream is positioned in the Maturity/Platform Play quadrant. It’s data agnostic, providing broad functionality and use case support. The vendor is responsive to market needs and is investing in emerging features, and so is positioned close to the Innovation axis. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
HCL: DRYiCE IntelliOps
Solution Overview
DRYiCE IntelliOps is a data-agnostic AIOps offering with observability tooling for servers and networks. Its features include runbook and workload automation, as well as FinOps. Business workflow observability is an additional feature not typically found in other AIOps solutions. Data from external sources can be integrated using APIs and webhooks.
DRYiCE IntelliOps operates through key components such as AI-based monitoring, event correlation, predictive analytics, and automated incident resolution. It uses machine learning algorithms to analyze large volumes of operational data, identify patterns, and predict potential issues before they impact business operations. The solution’s automated workflows allow for seamless incident resolution, reducing manual intervention and enhancing operational agility.
Strengths
IntelliOps provides correlation and causality analysis, creating and deduplicating events to generate unique alerts. It consolidates these alerts into actionable items and initiates automated resolution. IntelliOps’ robust correlation engine and condition-based correlation system automatically group and map alerts efficiently, minimizing irrelevant alerts.
IntelliOps offers real-time observability of IT systems and business processes for proactive problem identification and remediation. It uses AI to analyze complex data relationships, identify root causes, and illustrate cause-and-effect relationships. Automated remediation is a key feature, with an automation engine that uses AI to analyze tickets and identify automation candidates, learning and driving the automation process. Its endpoint management module uses scripts for automated healing, understanding the context of each issue, recommending solutions, and triggering them without manual intervention. This intelligent management of the resolution lifecycle reduces effort and improves productivity.
IntelliOps integrates a feedback loop using supervised learning to enhance runbook recommendations and record confidence scores. It also employs an intelligent customer value analysis (CVA) to gauge customer satisfaction, which fits well with its effective business data integration capabilities.
Collaboration and workflow integration are strong, with IntelliOps integrating well with collaboration platforms and incorporating workflow automation tools.
Challenges
HCL has no support for edge AI, automated causal inference (cause-and-effect, not root cause), and predictive security posture management. There are no specific capabilities for ghost change detection, but using logs and integration with an ITSM system may provide an exploratory outlook on unapproved changes. Compliance management is not a strong point, but it scores 2/5 because audit data is available for compliance reporting.
Purchase Considerations
IntelliOps is offered both on-premises and as a hosted managed version over cloud to be consumed as a service, with a requirement for professional deployment services. The automation engine stands out for those needing robust automation within their AIOps solution. Licenses are based on per-OS (server) per-month pricing. The standard pack is for 2,501-5,000 endpoints, while the enterprise pack is for 5,001-10,000. IntelliOps has a professional services plan for both packs.
Professional services are required for deployment, but tool displacement is not necessary. IntelliOps can integrate with existing monitoring and observability tools so they are not impacted.
IntelliOps supports a wide range of industry verticals and use cases, including full-stack observability, automation, assistance, insights, and notifications based on GenAI and hybrid cloud management with FinOps.
Radar Chart Overview
HCL is positioned in the Maturity/Platform Play quadrant. It’s a data agnostic solution and supports most use cases, and the vendor prioritizes stability and continuity over breakneck advancement. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
IBM: IBM Cloud Pak for AIOps
Solution Overview
The IBM Cloud Pak for AIOps is deployable anywhere, including public and private clouds, on-premises, and as a SaaS application. Tool replacement is unnecessary, as the solution can ingest data from any source. Note that IBM sells tools for monitoring all aspects of IT, and an all-IBM solution is possible. The target market is any size enterprise; however, large enterprises are more likely to prefer all-IBM tooling.
The vendor has a unique approach that leverages IBM’s robust AI and natural language processing technologies, offering more advanced insights compared to many of its peers. The solution comprises several components, including AI-driven event management, anomaly detection, and predictive analytics, which work together to provide a holistic view of IT operations. It uses ML to analyze data across environments, identify anomalies, and correlate events in real time, enabling proactive incident management.
Strengths
IBM’s AIOps solution scored a 5/5 in advanced analytics, powered by Watson, which excels in normalization (including time normalization) and aggregation across multiple data sources. The analytics offers deep insights, predictive capabilities, and prescriptive recommendations. IBM’s multidimensional evaluation enables anomaly detection without manual data stream identification. Using AI, IBM analyzes relationships among data points for accurate causality and root cause analysis, reducing the time to and improving the accuracy of problem resolution.
IBM Cloud Pak for AIOps supports automated remediation through AI-driven insights and automation workflows. It integrates with ITSM tools to automatically create and resolve incidents based on predefined rules and AI recommendations. The platform supports hybrid deployments, combining on-premises and public cloud setups.
Ease of deployment is a strong point, with options including on-premises, public cloud, hybrid cloud, multicloud, and edge deployments. Using Kubernetes, IBM provides detailed documentation, guided installation processes, and prebuilt Helm charts. Red Hat OpenShift further simplifies deployment with enterprise-grade features. IBM Cloud Pak for AIOps is customizable, supports a wide range of data sources, and integrates with IBM and third-party tools, scaling easily across various environments to handle complex, large-scale IT infrastructures.
IBM has a robust ecosystem of partners and integrations, including major cloud providers (AWS, Azure, Google Cloud), ITSM tools (ServiceNow, Jira), and security solutions (IBM QRadar). It offers extensive support, consulting, training, and managed services, leveraging its global presence and expertise. The active community around Red Hat expands the ecosystem, providing additional resources and support.
Challenges
IBM is weak in predictive security posture management, though it can use an external tool like its own QRadar for security analytics and incident response, and other PSPM tools can also be integrated. IBM Cloud Pak for AIOps includes methods for detecting unauthorized or unintentional changes in the IT environment; however, these methods require extensive manual effort.
Purchase Considerations
Using IBM Cloud Pak for AIOps as a data-agnostic AIOps solution will yield good results; however, using an IBM-centric IT operations strategy may yield even better results. The added benefit of a single vendor is worth consideration. Designing a solution that uses all of Watson AIOps’ power will likely require support from professional services.
Licensing is done by endpoint but can become more complicated when including all IBM products. In that case, discounts and other licensing avenues become available.
AIOps is a general-use AIOps solution, but its ability to handle edge AI is a strong use case wherever low-latency processing is required, such as IoT, industrial automation, and remote site management.
Radar Chart Overview
IBM is in the Innovation/Platform Play quadrant. It’s flexible and responsive to the market and is investing in emerging features in this space. It’s also broadly applicable to many use cases. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Interlink Software: AIOps
Solution Overview
Interlink Software’s AIOps platform can integrate seamlessly with an existing IT ecosystem. It can collect metrics, logs, and traces for monitoring the landscape, delivering a single, service-aligned view of information and insights. Interlink supports open standards for data collection. Its universal integration capability leverages APIs, SDKs, TCP/IP applications, and devices to monitor services from multiple perspectives. The solution infers the health of a service from both instrumented monitoring (systems management agents, logs, traces) and end-user perspectives, including security, capacity, APM, and NPM.
Strengths
Interlink aggregates data from diverse sources, including logs, metrics, and events from cloud and on-premises environments. The Interlink Integration Hub facilitates data integration from any source.
Interlink has automated workflows for common issues, with automated remediation and self-healing capabilities. The user interface is graphical and low-code, creating a highly effective remediation capability. Interlink AIOps has flexible deployment options, including SaaS, on-premises, and hybrid models. Self-service is possible for on-premises and public cloud deployments. The Integration Hub smooths the integration of non-native data.
Challenges
There are no specific security capabilities built in, though Interlink can integrate with other tools to provide SIEM and SOAR functionality.
While Interlink synthesizes data from various sources well, it is oriented toward cloud and on-premises environments rather than the edge. Support for edge AI or edge computing does not exist, though it’s possible to gather data from edge devices. Interlink’s machine learning can perform anomaly detection, correlation, and root cause analysis; however, causal inference requires custom implementation or integration with specialized tools. The solution integrates well with SIEM and SOAR systems to enhance security operations but provides no native predictive security posture management functionality.
There is no specific mention of non-process changes, but integration with ITSM changes is possible. Limited ghost change detection is facilitated via a federated CMDB through Integration Hub.
There is no alerting on compliance violations, though it supports several compliance requirements, such as HIPAA and GDPRS.
Purchase Considerations
The solution’s lack of public details makes it difficult to compare it with others without interacting with the Interlink sales organization. Licensing is based on the environments (such as development, staging, production) and the number of concurrent users. Deployment is versatile, allowing both self-management and SaaS installation.
The ability to deploy in various environments (cloud, on-premises, hybrid) and the customizable nature of its workflows and dashboards make Interlink highly adaptable to different organizational needs. The use of low-code and no-code for most operations is a plus. The general-purpose nature of the Interlink offering allows it to fit into any size organization. Target marketing is primarily in the UK and Europe.
Radar Chart Overview
Interlink is positioned in the Maturity/Platform Play quadrant. It’s a general purpose AIOps solution providing broad functionality and use case support. Interlink prioritizes stability and continuity over breakneck advancement. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
ITRS: Geneos
Solution Overview
ITRS Geneos is a real-time monitoring and AIOps platform for financial services and other industries. It offers data aggregation and normalization, advanced analytics, anomaly detection, and automated remediation. Geneos excels in visualization with customizable dashboards and robust alerting mechanisms. It stands out for its real-time processing capabilities and low-latency monitoring, which are crucial for environments like trading floors. Deployment options include on-premises, cloud, and hybrid models, providing flexibility to meet diverse infrastructure needs. Its integration with various data sources and systems makes it a versatile and powerful tool in the AIOps market. Tool displacement is required.
Strengths
Visualization and dashboards require a console setup using a Java C++ app for installation and configuration on a Windows system, Linux, or Mac. Dashboards can be viewed in a browser. With Geneos 7, data visualizations can also be viewed in a browser with no console installation required. The methodology is dated, but once past the administration, user dashboards and visualizations are good. They are no-code, and where low-code is needed, most items are simple selections and fill-in-the-blanks.
Challenges
ITRS does not do well in emerging technologies. There is no support for edge AI, automated causal inference, predictive security posture management, or business data integration. Ghost change detection is also limited; however, integration with an ITSM system and creating a custom application within Geneos can enable detection of some changes.
Currently, ITRS cannot ingest business metrics. It has indicated that other features within AIOps have a higher priority, and business metrics are not on its short-term roadmap. Geneos has no generative AI capabilities. However, it is investigating and experimenting with GenAI techniques outside the core product, including translating text displayed in the UI and documentation translation. These items are on its 12- to 18-month roadmap. Compliance auditing and reporting are not provided out of the box. Data can be extracted for some compliance reporting forms, but compliance management is not a feature.
Purchase Considerations
SasS, managed, and self-managed deployment options provide flexibility. Root cause analysis is available without cause-and-effect analysis (causality). Professional services are not required; licensing is by object or device.
ITRS has a strong presence in banking services, detecting issues affecting messaging, transactions, databases, and hosting. Market data monitoring within ITRS Geneos MDM enables sell-side firms to monitor trades with live pricing, and buy-side can make informed investment decisions. This also enables market data providers to monitor the quality of serviceto their customers. Additionally, transaction monitoring follows the lifecycle of a transaction across both on-premises and cloud environments in a single view.
Radar Chart Overview
ITRS is positioned in the Maturity/Feature Play quadrant, as it’s focused on specific functionality and use cases (banking). It has a lower aggregate score in our decision criteria, making it an Entrant in this report.
LogicMonitor: AIOps
Solution Overview
LogicMonitor AIOps is powered by Edwin AI, an AI solution that analyzes events and alerts and provides insights. Leveraging advanced ML and NLP algorithms, Edwin AI helps ITOps teams effortlessly identify problems, determine the root cause of those problems faster than ever before, and prevent events from exploding into business-critical incidents. LogicMonitor has a strong presence in mid-market companies but is often targeted at large organizations and managed service providers. Tool displacement is required to make use of all LogicMonitor functionality. LogicMonitor has strong integrations with ServiceNow for ITSM and PagerDuty for workflow automation.
Strengths
LogicMonitor uses machine learning algorithms to establish expected patterns for data points and identify data that falls outside of these patterns. Anomaly detection provides another avenue of insight into resource behavior, allowing users to catch issues before they escalate into more potentially severe events. Dynamic thresholds add functionality where static thresholds are limited. Alerts are dynamically generated when these thresholds are exceeded.
As a cloud-based, SaaS-based platform, LogicMonitor requires minimal on-premises setup with LogicMonitor Collectors as lightweight software agents. The agents can be quickly integrated with existing IT environments. The platform supports various devices and applications, making the deployment process relatively smooth and efficient. However, like any comprehensive monitoring solution, some complexity may be involved in larger, more intricate environments.
The flexible platform supports many IT infrastructure components, including on-premises, cloud, and hybrid environments. Users can create custom DataSources and LogicModules to tailor monitoring to specific needs. LogicMonitor has a strong ecosystem with partnerships, integrations, and third-party solutions. Online documentation, training, certifications, community tools, and professional services are available.
Challenges
Integration with third-party tools is possible although there are no native SIEM or SOAR tools from LogicMonitor. LogicMonitor’s Edge AI capabilities are limited compared to other solutions. Data can be collected but not analyzed at the edge. Automated remediation is not part of the LogicMonitor platform, but easy integration with ServiceNow improves workflow and remediation capabilities. LogicMonitor includes automated root cause analysis and, while it is effective, it is merely comparable to the industry standard and does not significantly exceed the capabilities provided by other AIOps platforms. It efficiently correlates data and events to identify root causes, but advanced, fully automated causal inference with minimal human intervention is still under development. Cause-and-effect inference has little support within LogicMonitor. LogicMonitor does not provide predictive security posture management and it can’t detect non-process changes. Integration with ITSM changes is possible, and topology and infrastructure modifications can be seen. LogicMonitor provides audit logs, user access controls, and reporting tools but no specific compliance management tools.
Purchase Considerations
LogicMonitor is an adequate AIOps solution without some of the more advanced AI features of other solutions. Professional services are required. Training varies from free (after signing up) to on-site and certifications. Pricing is by resource (object) or events per month, depending on product selection.
LogicMonitor has a background in monitoring devices and applications. New technologies have allowed it to move into the AIOps space. As such, it is a general-purpose AIOps tool with a background in real-time monitoring.
Radar Chart Overview
LogicMonitor is positioned in the Maturity/Platform Play quadrant. It prioritizes stability and continuity over breakneck advancement, and offers broad functionality and use case support. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
Logz.io: Open 360
Solution Overview
Logz.io’s AIOps solution leverages ML and advanced analytics to enhance IT operations and streamline incident management. It features real-time log analytics, metrics monitoring, and distributed tracing, providing comprehensive system performance and health visibility. Key capabilities include anomaly detection, predictive analytics, and root cause analysis, which help identify issues before they impact the business. The platform also integrates seamlessly with popular DevOps tools and cloud services, enabling automated responses and facilitating proactive management of IT environments. Logz.io Observability IQ AI product component delivers advanced AI from the IQ Assistant; This is its copilot and querying interface powered by GenAI integration that is delivered within the Open 360 Log Management, Kubernetes 360 and App 360 platform solutions. Open 360 fully supports open-source technologies such as Prometheus, OpenTelemetry, Fluentd, and others while building on top of popular open-source tools such as OpenSearch. Logz, using a single agent, gathers data from public and private clouds, on-premises, or on-site wherever the data is generated.
Strengths
Logz.io Open 360 offers robust data aggregation and normalization, efficiently handling a wide array of data sources and formats. The solution can automatically parse and normalize data which ensures uniformity, simplifies subsequent analysis, and enhances data correlation. A standout feature is its capability to convert log data into metric-like information, providing deeper operational insights. Logz.io utilizes an integration with GenAI for sophisticated anomaly detection, predictive analytics, and pattern recognition. The solution’s foundation on OpenSearch allows for rapid data indexing and querying, while its integration with Kibana enables users to perform detailed analytics, craft complex queries, and create insightful visualizations. Notably, forecasting is focused specifically on capacity thresholds, aiding in proactive resource management.
Challenges
There is no automation engine or support for runbooks; however, integration with orchestration and remediation tools is possible. Logz can send data to collaboration tools such as Teams and Slack, which are endpoints for notification and not bidirectional workflows. The SIEM product supports some workflow automation.
Logz ingests data for centralized data processing and analytics, but its capabilities in edge AI are limited compared to platforms specifically designed for edge computing environments.
Logz analyzes correlation and can provide a root cause through its advanced analytics and machine learning algorithms. However, fully automated causal inference, which requires understanding complex cause-and-effect relationships in real time, is on the roadmap. Logz has solid security analytics and integrations, especially if its SIEM module is used, but it does not have predictive security posture management. Logz can detect changes and anomalies in log data, which can indirectly help identify ghost changes. Its machine-learning algorithms assist in detecting unusual patterns.
Purchase Considerations
Logz’s Open 360 is a SaaS-only deployment that relies on open source tools and OpenTelemetry. Its method of handling logs saves space and cost by keeping only relevant log data.
Open 360 is priced by the volume of logs, metrics, and traces Logz ingests and has a strong presence in the SMB market. That said, Logz is targeted at enterprises of any size, including service providers.
Radar Chart Overview
Logz is positioned in the Maturity/Feature Play quadrant, as it’s focused on specific functionality and use cases. Logz has demonstrated a lower rate of tech delivery than the market, with only semiannual major releases in the last 12 months, positioning it as a Forward Mover. It has a lower aggregate score in our decision criteria, making it an Entrant in this report.
MeshIQ*: AIOps
Solution Overview
MeshIQ is a middleware observability and management platform for messaging, event processing, and streaming across hybrid clouds. The platform consists of three modules: observe, manage, and track. These are installed and licensed together, although customers may use only some functionality. Tool displacement is unnecessary except for message queue monitoring systems that may already exist in the organization.
meshIQ can track transactions across multiple middleware and infrastructure environments, stitching related messages together to visualize the business flow and to identify and alert on anomalies, latency, and lost and diverted messages using real-time data analytics. Transactions can include applications such as IBM MQ, Apache Kafka, Solace, RabbitMQ, ActiveMQ, and many more. meshIQ can ingest data from OpenTelemetry. It uses agents to collect data.
The meshIQ solution can automate corrective and preventive actions via scripting and APIs. Integrations with ticketing systems such as ServiceNow, event management systems, and collaboration tools enable automation wherever it is relevant. It provides out-of-the-box, configurable dashboards, views, and other visualizations and reporting to meet different business needs.
The platform is a general-purpose, data-agnostic AIOps solution with a particular focus on message queue monitoring and analysis. The meshIQ platform consists of three modules: observe, manage, and track. These are installed and licensed together, although some customers may use only some functionality. Tool displacement is unnecessary except for message queue monitoring systems that may already exist in the organization.
Strengths
The AutoPilot M6 engine allows the aggregation, sorting, filtering, merging, and joining of various events and metric streams in real time using a wizard-driven GUI interface.
MeshIQ excels in aggregating and normalizing data from various sources, including message queues. It collects and integrates data from multiple middleware systems, ensuring a unified view of the IT environment.
MeshIQ can see inside message queues and topics to monitor the performance of middleware systems, including message flows, message rates, and availability. It can proactively catch problems and bottlenecks within message queues before they impact applications, a unique feature in the AIOps marketplace. The solution can handle standard analytics, including forecasting. Although the configuration for anomaly detection may not be as simple as other vendors, it has above-average results when message queues are considered.
Challenges
No specific security capabilities exist, but MeshIQ can integrate with external tools. Edge AI is not supported, but data can be gathered from devices and applications on the edge. MeshIQ does not handle automated causal inference except within messages where cause-and-effect can be seen.
MeshIQ does not support a predictive security management posture. GenAI abilities are lacking, but they are on the roadmap for 2024. Ghost changes within message queues are visible, but changes outside change management methodologies are invisible.
Purchase Considerations
Any organization dependent upon transaction message queues and needing an AIOps solution should look at MeshIQ carefully. It is the only AIOps solution that provides both analysis inside message queues and AIOps functionality. Though most portions of the solution are easy to install and configure, professional services are recommended to design the most secure solutions.
MeshIQ is heavily used in industries where mandatory secure transactions are made, such as banking, retail, and financial services. The target market is financial institutions and large enterprises needing insight into business transactions within message queues. Deployment is via SaaS or a self-managed installation on-premises or in a private or public cloud.
Radar Chart Overview
MeshIQ is positioned in the Innovation/Feature Play quadrant, as it’s focused on specific functionality and use cases. The vendor is flexible and responsive to the market, and it’s likely that the solution will look and feel different year over year. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
New Relic*
Solution Overview
New Relic offers an AI-enabled observability platform with more than 30 named features for AIOps, infrastructure monitoring, and application performance management for cloud-based, on-premises, and hybrid environments. Features include APM, synthetic transactions, code profiling, real user monitoring, and more. The AI integration includes AI analytics and natural language interaction. Capabilities may be added anytime, and data can be ingested from other sources before replacing them with native New Relic tools. Business observability adds customer browsing, product selection, checkout, and post-sale activities to the platform.
New Relic has four types of agents: APM agents for server-side applications, browser agents for browser applications, infrastructure agents for hosts and on-host integrations, and mobile agents for mobile applications. It can handle a wide range of data sources, including logs, metrics, traces, and events, and can use structured and unstructured data within the New Relic Database, which supports its own query language. In addition, the solution has built-in support for industry standards such as OpenTelemetry and Prometheus. Out-of-the-box integrations are extensive, but New Relic supports network monitoring via SNMP, network flows, and network syslog.
Strengths
New Relic scored high in anomaly detection, correlation, and casualty analysis. Anomaly detection supports multiple dimensions, allowing data from more than one source to be handled to create a single data point for detection. Alerting also supports multiple dimensions. Root cause analysis is supported; however, forecasting future metric values is not.
Deployment is simple and can be completed in minutes. Additional tasks such as agent deployment and other data ingestions may be required.
New Relic’s day-to-day ease of use is strong, with flexible dashboards, easy customization, and an AI assistant on most dashboards. Natural language can query the system, and tabular and graphical output is available.
New Relic has a strong ecosystem with partnerships, integrations, and third-party solutions. It also offers online documentation, training, certifications, community tools, and professional services as part of its strong support ecosystem.
Challenges
New Relic Vulnerability Management provides runtime software composition analysis (SCA) and vulnerability assessment prioritization with no additional configuration when using supported New Relic APM agents. This is not SIEM or SOAR but does add additional security. SIEM and SOAR are available as integration with other products.
New Relic can ingest data from agents on the edge but provides no edge AI capabilities.
The root cause of incidents is handled well. The emerging technology of automated causal inference, cause and effect inference, and analysis is lacking.
New Relic does not support predictive security posture management but does have a module called Interactive Application Security Testing (IAST). It can find and fix exploitable vulnerabilities.
Though New Relic does not support ghost change detection, there is extensive support for change tracking to capture changes within deployments and use them to contextualize performance data. Change tracking can record changes via API or configure deployment pipelines. The changes display as markers on charts for APM, browser, mobile monitoring, service level management, custom dashboards, and many more experiences across New Relic.
Purchase Considerations
The 30 named features are all included with a single license. The cost of data consumption determines the final cost. A free version with limited data consumption is available.
New Relic provides excellent integration with DevOps processes and remains a strong observability tool. Professional services are not needed; however, administrative oversight is important as the ease of adding capabilities can increase costs. Deployment is via SaaS and New Relic agents.
It is a pure SaaS solution that targets MSPs, SMBs, and large enterprises and is supported by international regions. Common use cases for New Relic include e-commerce, retail, and healthcare. It also provides a use case called New Relic for Startups.
Radar Chart Overview
New Relic is positioned in the Maturity/Platform Play quadrant. New Relic’s release cadence continues to be rapid compared to the market, earning it an Outperformer rating. It prioritizes stability and continuity over breakneck advancement and offers broad functionality and use case support. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
OpenText: Operations Bridge
Solution Overview
Operations Bridge is a single product with multiple modules. It is built on the OpenText OPTIC platform, which provides a single data lake for rapid data ingestion to support real-time analytics. It can be deployed on-premises in a customer’s public cloud instance, consumed as a SaaS service, or deployed on a private cloud. Tools may be displaced, but Operations Bridge can ingest from other sources and has OpenTelemetry native data ingestion featuring agent and agentless monitoring. Operations Bridge also integrates more than 200 third-party tools to support existing investments.
Operations Bridge provides consolidated performance and event management, optimization, along with customizable dashboards. It monitors IT environments, consolidates and normalizes data from third-party tools, and applies automated discovery, monitoring, analytics, and remediation to private, public, multicloud, and container-based infrastructure data.
The solution automates AIOps with machine learning and analytics, which reduces events and accelerates root cause identification and remediation. GenAI finds and summarizes information and will provide users with the steps required to troubleshoot and remediate issues faster. Users, including executive stakeholders, gain actionable insight on any device with a browser within tailored dashboards showing key status, business, and IT KPIs.
Strengths
OpenText offers robust analytics capabilities, including seasonality, log analytics, automatic event correlation (AEC), predictive forecasting, GenAI, and multivariate anomaly detection. It supports both agentless and agent monitoring, normalizing data from multiple sources. Operators can fine-tune operations using the AEC Explained UI, which shows AI-made correlations and allows for the promotion or prohibition of these correlations.
AEC takes about two weeks to learn the environment and adjust ML models for operational changes. Contextual information is integrated into analytics and troubleshooting workflows. Historical performance data is used for anomaly detection, baseline calculation, threshold breaches, and trend analysis in the context of application or infrastructure issues.
Native automated remediation includes policy-based automation using role-based access and rule-based automation rules. Users can define remediation steps with role-based restrictions on execution. To enhance GenAI capabilities, a feedback loop integration is planned for 2024.
OpenText is user-friendly, with various OOTB methods for data ingestion, an integration hub, and a point-and-click wizard. It supports ITSM integration for incident lifecycle management, with runbook automation triggered by operators or running automatically. The OOTB setup reduces configuration time, and OOTB dashboards and reports provide immediate usability. The unified OPTIC One UI centralizes configuration and operational tasks. Integrated online help assists in upskilling staff.
The solution is flexible, offering on-site and SaaS deployments, and its modular nature enables customization and expansion. The vendor ecosystem is comprehensive but rated slightly lower because documentation is hosted on the OpenText site, sometimes making it difficult to find complete information.
Challenges
Though SOAR is available within Operations Bridge, SIEM integration is not, but it is planned for October 2024. Similarly, while the solution features ML-based anomaly detection and causal inference for events, automatically correlating events with probable causes, GenAI-based remediation suggestions and advanced causal inference models for automated causal analysis are not yet available, though planned for 2024 and 2025, respectively.
While ghost changes are not supported, the system can infer infrastructure changes outside the change management process. Operations Bridge lacks built-in security compliance monitoring but can integrate with compliance tools.
Purchase Considerations
Operations Bridge appears to have found a final home at OpenText. It has always been a capable AIOps solution, but support has been difficult and inconsistent. Professional services, though not required, may be needed in complex environments.
The licensing system uses units consumed by agents, agentless, synthetic, and real-user monitoring. The analytics add-on provides automatic event correlation, anomaly detection, and Operations Bridge analytics (search, forecasting, log/event analytics, and alerts). Application Observability is an OpenTelemetry-powered add-on solution for traces, metrics, and logs.
OpenText Operations Bridge is used in various industries and organizational sizes. It is a general-purpose AIOps solution not focused on a particular vertical segment.
Radar Chart Position
OpenText is positioned in the Maturity/Platform Play quadrant. It prioritizes stability and continuity over breakneck advancement, and offers broad functionality and use case support. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
PagerDuty: PagerDuty AIOps
Solution Overview
PagerDuty AIOps is a domain-agnostic SaaS-only solution built on the Terraform open source infrastructure. Its target market spans organizations of all sizes and industries, including but not limited to cloud and network service providers, technology, finance, healthcare, retail, and manufacturing. The platform serves any business dealing with complex IT environments and requiring generative observability data to drive automated insights and operational efficiency.
The out-of-the-box data ingestion options are extensive and may verge on “any data from anywhere” capabilities. Data from BI tools, such as Power BI, Snowflake, Looker, and Tableau, can be ingested via the PagerDuty API.
PagerDuty AIOps offers powerful capabilities to deflect work from teams, using automation and ML-powered noise reduction to avoid interrupting human responders until they’re absolutely required. At that point, the solution surfaces actionable insights and built-in automation to help with triage acceleration and faster resolution. PagerDuty achieves this with event-driven automation and noise reduction that can be implemented per service or span across services. PagerDuty offers a range of options for noise reduction, from out-of-the-box alert grouping that works at the push of a button to learn grouping over time to more customizable options that use time windows or event orchestration for even more precision. PagerDuty Visibility Console allows users to add, remove, and resize modules on any dashboard. It filters by team or service and prioritizes various modules, including the incidents and service activity modules. The event orchestration feature drives to “next best action,” including diagnostics, incident actions, and remediations, by triggering user-defined webhooks, incident workflows, or on-platform automation actions.
Automation capabilities can be customized to the customer’s needs, including Runbook Automation for automated diagnostics and remediation and sophisticated process automation for cloud, on-premises, or hybrid environments. Integrations with customer service operations and status pages are available, providing a unified platform for handling critical work before, during, and after incidents.
PagerDuty integrates with over 700 technology partners, fitting into any technology stack based on specific customer environments and use cases. It is a data-agnostic solution that requires no tool displacement. It offers automation and orchestration connected to incident management, GenAI in Slack, customer service operations, and infrastructure automation tools like Ansible.
Strengths
With the addition of AI, including its chatbot, PagerDuty has strong correlation and causality analysis. Root cause analysis is guided by AI to shorten the time needed to resolve incidents. The AI assists with incident summaries and after-action reports.
PagerDuty excels in automated remediation, supporting it across various services through policies, rules, human actors, and workflows. Managed via event orchestration, automated action or runbooks, and incident workflows, the solution allows custom field population with event payload data.
PagerDuty is renowned for its collaboration and workflow capabilities. It tightly integrates with its incident management and automation solutions. It supports bidirectional integrations with tools like Teams and Slack, achieving high ratings.
PagerDuty’s AIOps solution installation is straightforward, with an intuitive UI and a comprehensive online knowledge base. Professional services are available for complex environments. Integrations are set up quickly via Global Orchestration Keys, and the solution can deliver organization-wide value within 30 days. Agent installation is fast and well-documented.
Designed for out-of-the-box value, PagerDuty features Intelligent Alert Grouping and Auto-Pausing for transient alerts. Its machine learning handles deduplication, suppression, and noise reduction, learning from user interactions. Automated incident workflows are configured with a drag-and-drop interface, and the mobile app supports grouping-related incidents. Training requirements are minimal, with free PagerDuty University courses and extensive knowledge base articles. Initial training for complex roles can be completed in under two hours.
PagerDuty boasts a comprehensive vendor ecosystem, including partnerships, integrations, third-party solutions, training, certifications, community tools, and professional services.
Challenges
PagerDuty currently does not provide edge AI capabilities. The event orchestration variables and probable origin modules allow some capabilities for automated causal inference. Though both aid in finding and resolving issues, they do not provide automated causal inference, thus cause-and-effect analysis.
PagerDuty does not directly support discovering changes made outside of defined process channels—ghost changes. Integration with an ITSM system allows knowledge of planned changes. Audit trail reports are available, and an audit API can extract additional information. However, predictive security posture management is not available.
Purchase Considerations
PagerDuty is offered as a SaaS deployment, and no on-premises components are required (excluding data-gathering agents). Customers can choose between consumption-based AIOps licensing or the legacy Digital Operations product for AIOps capabilities, both delivered as SaaS and seamlessly integrated with incident management.
PagerDuty is a strong performer in data-agnostic AIOps. Coupled with its well-respected collaboration and workflow abilities, the solution presents a strong case for large organizations with entrenched monitoring and observability tooling to gain an AI-assisted view of the enterprise.
In most cases, professional services are not needed. As with any AIOps solution, a very complicated organization may require help from the vendor.
PagerDuty is a general-purpose AIOps tool. Professional services may offer specialized assistance to use cases in retail, financial services, or healthcare users.
Radar Chart Overview
PagerDuty is positioned in the Maturity/Platform Play quadrant. It is an Outperformer with high rate of delivery over the last 6 to 12 months, releasing at an often daily upgrade cadence, with pace of delivery set to continue over the coming year. It’s SaaS-only, which limits its applicability across all use cases. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Riverbed: IQ
Solution Overview
Riverbed IQ is the AIOps product for the Riverbed Unified Observability and Optimization platform, an open, programmable AIOps solution that captures enterprise user experience, application, and network observability metrics. It applies various AI/ML techniques to establish baselines, detect anomalies, and contextually correlate across disparate data streams to identify issues. The AI powers the no/low-code workflows (runbooks) that automate and replicate troubleshooting investigations and execute remediation actions.
Tool displacement is not necessary, a fact that is emphasized by Riverbed; however, Riverbed does offer observability tools for cloud, networks, and infrastructure. The solution has several enabling technologies, including a unified agent for managing Riverbed and third-party agents; a topology viewer for dynamic mapping of devices in context; Edge Collector, which serves as a data broker between Riverbed IQ and Riverbed’s on-premises-based network performance monitoring (NPM) products; multimodal AI and automation for problem identification and resolution; and dashboards for real-time insights.
Strengths
A runbook is a workflow that is built using Riverbed’s low-code/no-code runbook editor or by using a graphical drag-and-drop method to assemble the workflow. Riverbed IQ supports runbooks for a number of tasks:
- Incident automation: Runbooks that execute automatically whenever a new incident is created
- Incident lifecycle automation: Runbooks that execute automatically whenever a triggering lifecycle event occurs on an incident
- API-driven automation: Runbooks triggered in response to a configured API call
- On-demand automation: Runbooks triggered on-demand or as scheduled
- Subflows: Runbook “macros,” which are chunks of reusable automation code that can be used to perform common functions or to implement integrations with third-party systems
The graphical interface to workflow automation is particularly intuitive.
As a SaaS application, Riverbed IQ is easily set up. Its data sources are network monitoring applications such as Riverbed Technology’s Riverbed NetIM, Riverbed NetProfiler, or Riverbed Edge, which handles data source connections for IQ. Wizards are available to facilitate adding data sources, and wizards can be created as necessary.
Riverbed is data agnostic, though observability add-ons for networks, applications, infrastructure, and user experience are available. The vendor ecosystem for Riverbed is on the stronger side of the market.
Challenges
SIEM/SOAR systems are one category of third-party systems to which IQ can be connected. IQ can investigate threats found in traditional security tools by leveraging no/low-code runbooks to automate forensics data collection.
Riverbed IQ currently does not provide edge AI capabilities. Data can be collected from any location, but processing and intelligence occur in the SaaS application.
Riverbed IQ does have advanced analytics, but there is no support for cause-and-effect analysis. Predictive security posture management is not supported unless integration with a third-party tool can supply the functionality. GAI using LLMs is not available. Ghost change detection may be possible using workflows, but there is little indication of interaction with the change process other than ITSM integration. There are no compliance management features available. Audit logs can be used to create reports but are not defined by default.
Purchase Considerations
Riverbed is primarily focused on meeting the needs of the large enterprises, although it does service SMBs via its partner network. Based on its capabilities and complex feature set, Riverbed IQ may require professional services to design a complete solution. Training is readily available, and online documentation is complete if it is difficult to follow. Riverbed Edge collectors are the only on-premises software for deployment. Riverbed IQ is a subscription SaaS offering, with pricing based on volume-based consumption of tiers of runbooks per month (25,000, 50,000, and so on).
Riverbed is well-known for its expertise in network monitoring and observability. Complex networking environments can take advantage of its networking capabilities and gather other monitoring and observation data to create a viable AIOps solution.
Radar Chart Overview
Riverbed IQ is positioned in the Maturity/Feature Play quadrant. It is classified as an Outperformer, with software updates pushed to production often daily, and most feature enhancements available at the end of two-week sprint cycles. This rapid feature-release schedule should help the company become well-established in the AIOps market. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
ScienceLogic: SL1
Solution Overview
ScienceLogic SL1 is a scalable modular platform deployed and self-managed on-premises (using virtual machines or bare metal) or as a SaaS in AWS or Azure regions worldwide. Automations are a key feature of SL1. Runbook automation allows SL1 to execute actions automatically when specific event conditions are met. Automation in SL1 is handled by two kinds of policies: automation policies that define the event conditions that can trigger an automatic action and action policies that can be triggered when certain criteria are met.
SL1 offers a number of add-ons. Restorepoint is a disaster recovery and secure configuration management appliance for network devices, such as routers, switches, proxies, and firewalls. SL1 PowerFlow provides a generic platform for integrations between SL1 and third-party applications, such as ServiceNow, xMatters, Opsgenie, or Cherwell Service Management. Zebrium Root Cause as a Service (RCaaS) uses unsupervised machine learning on logs to automatically find the root cause of software problems. It does not require manual rules or training. As Zebrium ingests logs, its AI/ML engine analyzes the logs, looking for abnormal log line clusters that resemble problems, such as abnormally correlated rare and error events across all log streams.
Strengths
SL1’s native orchestration and automation services approach the depth and breadth of a SOAP platform. The solution can combine runbooks with its own PowerFlow platform to enhance automation tasks. SL1 PowerFlow can handle complex automation with bidirectional and man-in-the-middle logic. However, the lack of closed-loop learning for automation keeps it from achieving the highest possible score in this category.
SL1 collaboration tools include Teams, Slack, and others. The number of OOTB-configured integrations is better than average, and collaboration tools can trigger workflows. Metrics can automatically trigger workflows within the workflow engine for autonomous issue remediation.
Overall, the solution is easy to use. Runbooks and PowerFlow automation are easy to create and maintain. The orchestration and automation features use low code—essentially no code—for creating and running automation. ScienceLogic handles SaaS upgrades, although the user is responsible for upgrading collectors and agents. Collectors are upgraded via download, while the agent upgrade is a redeploy and takes advantage of the automated bulk deployment process.
The number of deployment options and the orchestration and automation capabilities make SL1 highly flexible. The solution can be installed almost anywhere, whether self-managed or managed by ScienceLogic. However, the downside of that flexibility may be the need for professional services in complex environments.
ScienceLogic has a complete ecosystem to support partners, enterprises, and users with documentation, certifications, community, blogs, and professional services.
Challenges
ScienceLogic does not support edge AI, but it can collect telemetry from edge devices and apply AI/ML within the SaaS application. Automated causal inference is available for log data but not with other ingestions. There is no method to detect changes made outside of change management processes, but integration with ITSM changes is possible.
Purchase Considerations
ScienceLogic SL1 straddles the line between data-agnostic AIOps and data-centric AIOps. It provides monitoring and observability tooling, though the tooling is not required. Using PowerFlow, data and events can be ingested from anywhere. There is no requirement for professional services, although planning and implementation services are available.
Deployment options abound. The solution can be deployed on a physical or virtual appliance, to the public cloud, as software only (hardware requirements exist), SaaS, or self-managed.
SL1 is licensed via a tiered SKU model—Base, Standard, Advanced, and Premium—which offers escalating functionality. Within each tier, licensing is done per device and available via perpetual or subscription licensing.
ScienceLogic SL1 has a strong presence in managed service organizations due to its agent and agentless technologies and tenant segregation policies. Otherwise, it is a general-purpose tool suited for larger environments. Midsize companies may find the solution overwhelming, but the professional services and partnership teams are strong.
Radar Chart Overview
ScienceLogic is positioned in the Maturity/Feature Play quadrant. It has a strong presence in managed services but doesn’t target MSPs exclusively and so is on the Feature Play side but close to the Platform Play axis. The vendor prioritizes stability and continuity over breakneck advancement. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
ServiceNow*: ITOM AIOps Enterprise
Solution Overview
ServiceNow ITOM AIOps Enterprise leverages AI and machine learning to enhance IT operations management. It excels in data aggregation and normalization, collecting data from diverse sources to provide a unified view of the organization’s digital estate. Advanced analytics and anomaly detection capabilities enable proactive identification of issues. Correlation and causality analysis help pinpoint root causes, enabling efficient problem resolution. The platform supports automated remediation, reducing manual intervention. Visualization and dashboards offer intuitive insights, while collaboration and workflow integration streamline incident management. Additionally, the solution integrates with SIEM and SOAR for comprehensive security operations. Emerging features like edge AI, automated causal inference, and predictive security posture management further enhance its capabilities, making it a robust solution for modern IT environments.
ServiceNow ITOM AIOps Enterprise is based on ServiceNow ITOM and other modules, creating a complete offering that meets this report’s feature requirements. The CMDB is used for data aggregation and normalization. The Now Intelligence Suite and Performance Analytics provide advanced analytics. The Operational Intelligence module is used for anomaly detection. Event Management and Operational Intelligence modules offer correlation and causality analysis. ServiceNow Orchestration and ITOM provide automated remediation; Performance Analytics, Reporting, and Dashboards allow visualization; ITSM, IntegrationHub, and Collaboration Integrations add collaboration and workflow integration; and Security Operations and Integration Hub provide SIEM and SOAR integration.
Strengths
ServiceNow ITOM AIOps Enterprise excels in data aggregation and normalization, advanced analytics, anomaly detection, correlation and causality analysis, automated remediation, collaboration, and workflow integration. It uses the CMDB and IntegrationHub to aggregate data from multiple sources and normalize it into consistent formats. Performance Analytics and Predictive Intelligence offer advanced reporting, trend analysis, KPI tracking, and predictive insights.
Operational Intelligence monitors the IT infrastructure, using machine learning to detect real-time anomalies and identify potential issues. Root cause and cause-and-effect analysis are integral features. ServiceNow Orchestration automates IT tasks and workflows, reducing manual intervention, while ITOM extends this by automating remediation.
ITSM provides a framework for managing IT workflows and processes. IntegrationHub enhances team communication through integration with tools like Slack and Microsoft Teams. The platform supports seamless business data integration with ERP and CRM systems.
ServiceNow is cloud-based, simplifying deployment and scaling, though complex setups may require professional services. It is highly flexible, supporting extensive customization and integration with third-party applications. Its strong vendor ecosystem includes partnerships, prebuilt integrations, comprehensive training, certifications, a vibrant community, and professional services. The Policy and Compliance Management module offers robust compliance features, including automated alerts.
Challenges
ServiceNow has a strong AI but does not offer edge AI capabilities within its SaaS platform, nor does it yet provide automated causal inference. The ServiceNow AI has yet to facilitate cause-and-effect inference; however, it will likely do so in the next 12 months.
Purchase Considerations
ServiceNow is one of the market’s largest and most comprehensive IT support platforms, and it is completely cloud-based. Very few functions are not provided by ServiceNow although the range of modules and their deployment can be complicated to administer. Its strong partnerships and professional services support can help organizations of any size to implement AIOps. ServiceNow ITOM licensing is based on subscription units. Though it is general purpose and targeted at any size organization, the AIOps solution is a better fit for the largest businesses.
ServiceNow has broad use case applicability with modules for every part of IT, from monitoring to people management.
Radar Chart Position
ServiceNow is positioned in the Maturity/Platform Play quadrant. It is a data-centric, general purpose solution. ServiceNow prioritizes stability and continuity over breakneck advancement. It offers broad functionality and use case support. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Splunk, a Cisco Company: Splunk Platform and Splunk Observability Cloud
Solution Overview
Splunk’s AIOps solution spans a combination of two integrated products–Splunk Platform and Splunk Observability Cloud. Cisco acquired Splunk in March 2024, and Splunk is currently operating as a division of Cisco. The Splunk Platform includes Splunk Enterprise and Splunk Cloud and features the premium apps Splunk IT Service Intelligence (ITSI) and Splunk Enterprise Security. It provides comprehensive log analytics and management, centralized event correlation for alert noise reduction, anomaly detection, predictive analytics, high-level business service monitoring, and automation to achieve AIOps.
Splunk Observability Cloud is an integrated product that uses any log data available on the Splunk Platform, together with metrics and trace data. It provides ITOps and engineering teams with capabilities such as application performance monitoring, infrastructure monitoring, network monitoring, real user monitoring, synthetic monitoring, point-and-click log analysis, and incident response. These capabilities can also be purchased in an a la carte fashion.
The Splunk Platform can be consumed through a SaaS model in Splunk Cloud or an on-premises deployment in Splunk Enterprise. Splunk Observability Cloud is available only through a SaaS model. This is a platform play with tool displacement required for full capability.
Strengths
The Splunk Platform excels in advanced analytics, anomaly detection, correlation and causality analysis, and automated remediation. Its AutoML capabilities dynamically identify thresholding parameters for metrics or KPIs, provide explanations, and adapt models as the IT environment evolves. The Configuration Assistant and forthcoming Drift Detection will help manage thresholding models and detect operational changes. Splunk’s Smart Mode groups notable events using ML, and the Machine Learning Toolkit (MLTK) offers a low-code experience with hundreds of ML algorithms and the ability to upload pretrained models.
Splunk ITSI aggregates and correlates data from diverse IT sources to visualize interdependencies and train predictive models for identifying causal links between IT events and outcomes. It automates IT and security tasks with trigger-based remediation and integrates with tools like Splunk SOAR, Ansible, and Puppet for comprehensive automation. Chatbot integration with Slack and Teams enhances collaboration.
Emerging technologies where Splunk excels include edge AI, automated causal inference, and business intelligence. The Splunk App for Data Science and Deep Learning provides forecasting, clustering, NLP, and causal inference techniques.
Splunk ITSI integrates business and technical data, offering executive-level real-time views and operational dashboards for regulatory reporting. The platform supports on-site and SaaS deployments. The robust Splunk ecosystem includes extensive documentation, community support, partnership programs, training, and professional services.
Challenges
While Splunk has robust change detection capabilities we do not see where those changes are reconciled against change management integration to know if they are intentional and approved, or if they are ghost changes. Integration with ITSM changes is possible. Splunk does have observability, so topology changes are also visible. Also, the Splunk security products may be able to see some changes with the right tool configuration.
Splunk allows for the creation of compliance management visibility within the platform, but it is not offered as a standalone console within the Splunk Platform. Audit logs do allow the creation of compliance reports, but there are no alerts on compliance violations.
Purchase Considerations
Following the March 2024 acquisition, Splunk is currently operating as a division of Cisco. In June, at Cisco Live, Cisco announced its full-stack observability vision for Cisco and Splunk customers which included that it is no longer selling the Cisco observability solution for cloud-native APM to net-new customers and is replacing it directly with Splunk Observability Cloud. As part of that strategy, Cisco is merging AppDynamics into Splunk.
The Splunk Platform, both the on-premises and SaaS versions, may require some professional services support for complex enterprises. However, smaller companies may find the training and documentation sufficient. The Splunk Platform is used on-premises, while Splunk Observability Cloud is SaaS. The Splunk Platform is supported by physical and virtual machines on-premises or in the cloud. Self-management is an option in any deployment scenario.
Splunk targets organizations of any size. Given its history in log management, it should be considered a strong option for organizations with a significant log volume.
Radar Chart Position
Splunk is a Leader in the Maturity/Platform Play quadrant. The company has demonstrated strong delivery over the last year with an update cadence of six minor monthly and six weekly patch releases, and this looks set to continue, classifying Splunk as an Outperformer. Splunk prioritizes stability and continuity over breakneck advancement and offers broad functionality and use case support. It has a higher aggregate score in our decision criteria, making it a Leader in this report.
Sumo Logic
Solution Overview
Sumo Logic is a cloud-native, multitenant SaaS solution hosted on AWS and available in multiple regions. The AIOps solution comprises three products: Troubleshooting and Monitoring, Cloud Infrastructure Security, and Cloud SIEM. Cloud SOAR can be added. This report focuses only on the Enterprise Suite offering. Lower tiers do not include sufficient features for an enterprise AIOps solution. Sumo Logic provides a monitoring and observability solution that displaces existing tools in these areas.
There is an OpenTelemetry collector for data aggregation of all logs, metrics, traces, and events. Sumo Logic can collect data from many cloud and on-premises technologies, using OpenTelemetry, Fluentd, and agentless technologies.
Strengths
SIEM and SOAR can be activated as part of the enterprise licensing tier, subject to minimum volume and service requirements.
Sumo Logic offers Cloud SIEM, with Insight Rules Engine (including 900+ out-of-the-box rules), Entity Timeline, Entity Relationship Graph, Insight Global Confidence Scores, Automation Service (playbooks for Insight enrichment, notifications, and containment actions), and MITRE ATT&CK Coverage Explorer. Cloud SOAR has Playbooks (including a complete Sumo Logic playbook catalog), Progressive Automation, Case Manager, Supervised Active Intelligence, and War Room features.
Deployment is via a multitenant, multiregion SaaS offering. Collectors are available for Kubernetes, AWS Observability, tracing, and custom or app component metrics (using Telegraf). Sumo Logic manages all of them. Ease of use is better than average, with drag-and-drop workflows and easy customization using low-code methods. The Sumo Logic ecosystem includes documentation, training, certification, community, partners, and professional services.
Sumo Logic uses prebuilt apps or custom queries for audit and compliance. It supports both internal audits, which are conducted by members of the organization, and external audits, where a government or other independent authority checks data to verify compliance standards. Both internal and external audits can be scheduled or random, and can be included in Sumo Logic’s queries, alerts, and dashboards to monitor data and ensure only authorized users have access to sensitive information.
Challenges
Sumo Logic can ingest data from edge devices, but the use of AI at the edge is not available. However, Sumo Logic does offer Sensu, an open-core project that enables observability as code. Sensu can filter telemetry at the edge using proprietary pipelines to enact use cases such as alert noise reduction.
Sumo Logic does not yet provide direct support for automated causal inference. However, its roadmap includes additional capabilities to understand cause-effect relationships, including event analytics, which directly correlate application/infrastructure telemetry with user- and system-induced changes.
The Sumo Logic Alert Response experience allows first responders (on-call engineers, SOC analytics, security engineers) to view the details of an alert as well as prebuilt cause-effect analytics, called “Alert Context.” Its security offerings for SIEM and SOAR do not have predictive security posture management. It is possible to integrate with third-party solutions.
Changes made outside of change management processes cannot be detected. However, with proper configurations, the Sumo Logic security products may be able to see some types of changes.
Purchase Considerations
Sumo Logic has added AI to its observability tools without calling the result an AIOps solution. This may be a marketing oversight or a specific direction for Sumo Logic. The available professional services are not required to deploy and operate Sumo Logic.
In March 2024, Sumo Logic launched a new Flex pricing model available for new customers. Like the previous model, this one is also credit-based; however, there are significant changes, including zero-dollar ingest pricing—credits are now consumed by scanning data (i.e., generating analytics), and there is a nominal storage fee. Data tiering complexity has been removed and all data is instantaneously available at maximum performance. This model simplified the credit-to-dollar ratio—one credit = $1 (MSRP); and consolidated licensing to a single plan with all available capabilities—Enterprise Suite Flex (though Cloud SIEM and Cloud SOAR require separate activation).
Current customers can leverage the old pricing model from previous years. Credits are consumed for multiple variables, including the data type (such as logs, metrics, and traces) and the destined data tier (Continuous, Frequent, or Infrequent).
Sumo Logic markets to organizations of any size. Typical use cases include DevOps, SecOps (SIEM and SOAR), and business analytics. As a general-purpose solution, Sumo Logic does not target specific industries or vertical markets.
Radar Chart Overview
Sumo Logic is positioned in the Maturity/Platform Play quadrant. It provides broad functionality and use case support but does require tool displacement. The vendor is responsive to market needs and is investing in emerging features, and so is positioned close to the Innovation axis. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
Zenoss*: Zenoss Cloud and Zenoss Service Dynamics
Solution Overview
Zenoss AIOps solution offers a platform for optimizing IT operations through intelligent automation and real-time insights. It is available as a SaaS or on-premises deployment, providing flexibility to meet various organizational needs. The platform has extensive integration capabilities to allow integration with diverse IT environments and third-party tools. Zenoss provides comprehensive visibility and control, enabling proactive issue detection and resolution. Its intuitive dashboards and customizable workflows streamline daily operations, making it user-friendly for administrators and operators. The solution can correlate IT performance with business outcomes to enhance decision-making and operational efficiency. Tool displacement is required, but data can be ingested from other tools. The SaaS application is Zenoss Cloud, while the on-premises platform is Zenoss Service Dynamics. The latter offers the highest level of customization. Both are required to meet the criteria in this Radar.
Strengths
Zenoss AIOps solution excels in four key features: data aggregation and normalization, advanced analytics, anomaly detection, and visualization and dashboards. Zenoss Cloud integrates data from diverse IT infrastructure components, ensuring consistency and accuracy through streaming data collection and a unified data model. The platform’s machine learning algorithms and advanced analytics process large data volumes in real time, offering predictive analytics and pattern recognition to forecast potential issues and identify trends. Robust anomaly detection is achieved through advanced machine learning models, dynamic thresholds, and baseline deviation techniques.
Zenoss provides comprehensive and customizable dashboards that deliver real-time insights into IT health and performance, and feature prebuilt and customizable widgets. The platform is highly flexible, supports various integrations and customizations, and can be deployed as SaaS or on-premises to meet specific organizational needs. Partnerships with major technology providers and extensive integrations with third-party tools, including ITSM and cloud services, ensure seamless data flow and unified management.
Challenges
Zenoss supports integrations with popular ITSM platforms, communication tools (Slack, Teams), and other collaboration software but does not offer a SIEM or SOAR solution. Nor does it have a dedicated edge AI solution, but its ability to collect and analyze data from edge devices is on par with the industry standard. Zenoss has event correlation and root cause analysis, which are forms of causal inference, but it does not support automatic cause-and-effect determination. Zenoss has some integrations with SIEM and SOAR tools although it lacks some of the more advanced automations, workflow triggers, and RBAC passed notifications. There is no support for detecting changes outside established DevOps or ITSM change management. Zenoss provides compliance monitoring and can alert users to non-compliance issues with topology objects.
Purchase Considerations
Zenoss incorporates third-party solutions to provide a comprehensive view of the IT environment. Training programs and certifications are available. The user community offers collaboration and knowledge-sharing tools, but its activity level is moderate. Zenoss provides professional deployment, customization, and optimization services.
Zenoss provides a powerful and flexible AIOps solution suitable for diverse IT environments, with notable data handling, analytics, and integration strengths. The SaaS deployment option is quick to enable, while the on-premises version provides self-service capabilities.
Zenoss uses a subscription-based licensing model for its SaaS platform and traditional device-based licensing for on-premises deployments. The cost structure is competitive but can be higher than some alternatives, especially for large-scale implementations.
Zenoss is a general-purpose AIOps solution, though it has a strong presence in federal, financial, and healthcare environments, and with service providers.
Radar Chart Overview
Zenoss is positioned in the Maturity/Platform Play quadrant. Zenoss’s feature release rate is somewhat slower than the market. Releases for Service Dynamics appear only yearly although the SaaS solution, Zenoss Cloud, is updated more often, leading to an overall classification of Forward Mover on our Radar. It has an average aggregate score in our decision criteria, making it a Challenger in this report.
ZIF: AIOps
Solution Overview
The Zero Incident Framework (ZIF) is an AIOps-based platform delivering enhanced network operations management around asset discovery, infrastructure monitoring, proactive detection, prediction, and automation. The platform uses artificial intelligence, machine learning, and intelligent automation.
ZIF offers a unified platform with various modules that provide a comprehensive solution for IT operations. It does not operate its own private network for service delivery. Instead, it leverages public cloud infrastructure and networks to provide its services. ZIF can be deployed in various cloud environments, such as AWS, Microsoft Azure, and GCP, allowing users to monitor applications and services across different cloud providers. Additionally, ZIF has a private cloud instance in GCP for all SaaS implementations. The platform is designed to be flexible and compatible with various cloud architectures to meet the diverse needs of its users.
Strengths
ZIF is competent in most key features with better-than-average capabilities in anomaly detection and automated remediation
The solution excels in anomaly detection, focusing on identifying deviations from normal using advanced algorithms. It detects exceptions before they become incidents and cause any negative impact and it implements real-time processing, enabling the identification of anomalies as they occur. ZIF utilizes advanced algorithms capable of pattern recognition, enabling the detection of complex and subtle anomalies that may indicate emerging issues or optimization opportunities. It allows the customization of detection thresholds and parameters. ZIF employs machine learning models that learn from data over time to identify normal behavior, and it uses predictive analytics to forecast potential issues based on detected anomalies.
ZIF provides automated remediation functionality, enabling automated response actions to be triggered. It implements policy-based frameworks, allowing IT teams to define rules and conditions for automated remediation actions. ZIF can automate tasks and configurations based on predefined rules or policies. Its actionable remediation leverages 250+ prebuilt bots across applications, infrastructure, databases, networks, and security. With a workflow creator, clients can build custom workflows and bots. Bots can be triggered manually from the GUI or automatically based on received alerts. In addition, ZIF monitors the outcome of automated remediation actions, providing a feedback loop that allows the system to continuously learn and improve effectiveness, leveraging intelligent automation capabilities for quick issue resolution and task completion.
Challenges
ZIF lags behind the competition with regard to the emerging technologies defined for this Radar. It does not support edge AI, predictive security posture management, business data integration, ghost change detection, or compliance management.
Regarding business data integration, ZIF should be able to import any time series data and correlate it with IT alerts and events. Change management systems within integrated ITSM solutions can be seen and monitored, though changes outside of the process are invisible. ZIF complies with many security standards but has no compliance management features available. Some data is available for audit purposes, but reports are not defined by default.
Purchase Considerations
Various deployment options allow organizations to choose how and who they want to manage their ZIF software:
- Physical appliance: This deployment model involves installing the solution on dedicated servers, storage drives, endpoint devices, and networking equipment on-premises.
- Virtual appliance: ZIF runs on virtualized infrastructure, such as VMware or Hyper-V.
- Public cloud image: ZIF can be deployed on major cloud platforms.
- Software only: ZIF can be installed on existing servers, offering greater flexibility but requiring more configuration.
- SaaS: ZIF is a SaaS application that can be installed on a public cloud.
Software-only self-managed deployments may require professional services in complex environments. ZIF offers two pricing structures: a perpetual model with a low upfront cost and flexible device-based pricing for more dynamic environments, and a subscription-based model with long-term value and minimal total cost of ownership.
ZIF should be regarded as a general-purpose solution for AIOps and is not aligned to specific verticals or specific deployments. It has good support among service providers.
Radar Chart Overview
ZIF is positioned in the Entrant ring of the Maturity/Platform Play quadrant. A Forward Mover in this report, ZIF has displayed a slower-than-average rate of tech delivery (such as introducing new features and feature enhancements).
6. Analyst’s Outlook
In the evolving landscape of AIOps, the integration of GenAI and LLMs is becoming commonplace among vendors. However, the outcomes vary widely due to the lack of standardized implementation methods. Most vendors have at least incorporated chatbots to facilitate interaction with their data sets. Some have enhanced their anomaly detection methods and closed-loop remediation processes for continuous learning, while a few allow chatbots to create visualizations without requiring knowledge of the underlying data.
A notable trend is the rebranding of observability solutions as AIOps by adding minimal AI capabilities. The journey from monitoring to observability and intelligence varies significantly across vendors. True intelligence is not a simple add-on feature; it requires a comprehensive understanding of business and IT operations. Solutions that best integrate business data with IT intelligence will offer the most value.
One critical area where AIOps solutions are making strides is in detecting ghost changes—unauthorized modifications that often lead to incidents. While GenAI holds promise, only a single vendor, Evolven, can detect ghost changes down to the parameter level. This feature benefits large, complex environments where such changes can cause significant disruptions.
The debate between data-agnostic and data-centric AIOps approaches continues to shape deployment strategies. A single-vendor solution covering the entire monitoring-to-intelligence chain can streamline operations and reduce costs. However, for larger organizations, integrating AIOps with existing tools is crucial. Displacing incumbent solutions requires navigating political landscapes and achieving consensus among stakeholders with differing priorities.
DevOps teams aim to expedite code deployment, while ITOps focuses on maintaining system stability. Infrastructure administrators, already equipped with monitoring tools, may resist additional solutions. Consequently, AIOps initiatives often require support from C-level executives who prioritize strategic organization-wide benefits over tactical gains.
A data-agnostic approach allows AIOps solutions to ingest data from any source, often sidestepping the need for immediate consensus. This method can gradually build a comprehensive enterprise view by incorporating data feeds over time. Although it requires ongoing effort, the initial implementation is less disruptive. In contrast, data-centric solutions may quickly replace existing tools, causing short-term upheaval but potentially offering faster integration.
Selecting an AIOps solution hinges on an organization’s specific needs and strategic goals. A thorough review of each vendor’s key features, emerging technologies, and business criteria will guide the decision-making process. Whether opting for a data-agnostic or data-centric approach, the ultimate goal is to enhance operational intelligence and drive meaningful improvements in IT and business performance.
To learn about related topics in this space, check out the following GigaOm Radar reports:
- GigaOm Radar for Cloud Observability
- GigaOm Radar for IT Service Management (ITSM)
- GigaOm Radar for Cloud FinOps
7. Methodology
*Vendors marked with an asterisk did not participate in our research process for the Radar report, and their capsules and scoring were compiled via desk research.
For more information about our research process for Key Criteria and Radar reports, please visit our Methodology.
8. About Dr. Shane C. Archiquette
Dr Shane C. Archiquette is dedicated to driving technological innovation and advanced AI to provide sustainable, outcome focused solutions for global markets.
9. About GigaOm
GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.
GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.
GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.
10. Copyright
© Knowingly, Inc. 2024 "GigaOm Radar for AIOps" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.