This GigaOm Research Reprint Expires Oct 18, 2024

GigaOm Radar for Application Performance Management (APM)v2.0

1. Summary

Bespoke applications are the backbone of technology-enabled businesses. However, the increasing complexity of modern applications places pressure on IT operations monitoring teams to ensure these applications are running properly. Today’s applications may execute in on-premises infrastructure or public or private cloud environments, with SaaS applications, in low-code/no-code environments, and in various combinations.

Moreover, with the additional emphasis on DevOps, developers are now a part of the operations landscape and must have the tools they need to participate. Developers must handle the application code complexity as well as the use of virtualized and as-code infrastructure. Application performance management (APM) tools and solutions are able to respond to this need, offering insights to both IT operations staff and development teams. These tools must handle a much more complex environment that now includes both operations personnel and developers as well as citizen programmers.

Current APM solutions often interact with the code via code injection or an application agent. These full-stack solutions provide detailed insights in the actual code responsible for generating metrics and other observations.

Many APM solutions have been folded into observability tools, moving the needle from monitoring to observability and eventually to awareness. Organizational awareness requires the monitoring and observability of IT systems and applications. The relationships among monitoring, observability, and awareness (MOA) are shown in Figure 1. MOA refers to the process by which data about operations (IT) and business (people and processes) is tracked and evaluated to enable a company to develop organizational awareness.

  • Monitoring provides the state of a single system (or service) and metrics about it (performance or a break/fix condition).
  • Observability looks at the state of multiple systems and asks additional questions about the health of these systems as a whole (such as why devices, systems, or applications are behaving a certain way within the context of IT).
  • Awareness brings together all information about the company to evaluate whether operations (IT) and business (people and processes) are performing in an acceptable way, what is likely to break next, and how to prevent problems before they are monitored or observed.

APM solutions are now situated squarely within the observability space, with some vendors moving toward awareness resulting from the addition of more analytical tooling using AI.

Figure 1. Monitoring, Observability, and Awareness Relationship

This GigaOm Radar report highlights key APM vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating APM Solutions,” we describe in more detail the capabilities and metrics that are used to evaluate vendors in this market.

This is our second year evaluating the APM space in the context of our Key Criteria and Radar reports. This is our second year evaluating the APM space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last year.

All solutions included in this Radar report meet the following table stakes—capabilities widely adopted and well implemented in the sector:

  • Application metrics and availability
  • Infrastructure discovery and monitoring
  • Transaction traceability and monitoring
  • User experience (UX) monitoring
  • Access controls

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

2. Market Categories and Deployment Types

To better understand the market and vendor positioning (Table 1), we assess how well APM solutions are positioned to serve specific market segments and deployment models.

For this report, we recognize the following market segments:

  • Small-to-medium business (SMB): In this category, we assess solutions on their ability to meet the needs of organizations ranging from small businesses to medium-sized companies. Also assessed are departmental use cases in large enterprises, where ease of use and deployment are more important than extensive management functionality, data mobility, and feature set.
  • Large enterprise: Here, offerings are assessed on their ability to support large and business-critical projects. Optimal solutions in this category have a strong focus on flexibility, performance, data services, and features to improve security and data protection. Scalability is another big differentiator, as is the ability to deploy the same service in different environments.
  • Managed service provider (MSP): MSPs remotely manage a customer’s network operations and deal with maintenance, upgrades, and other day-to-day activities, including ITSM. Their needs may align with those in the above categories, and solutions are assessed on ability to meet them.

In addition, we recognize two deployment models for solutions in this report:

  • Software as a service (SaaS): These are cloud-based solutions, designed, deployed, and managed by the software provider.
  • Self-managed: The solution may be deployed on customer-owned infrastructure or a public cloud and is managed by the enterprise.

Table 1. Vendor Positioning: Market Segment and Deployment Model

Market Segment

Deployment Model

SMB Large Enterprise MSP SaaS Self-Managed
Broadcom
Cisco
Datadog
Dynatrace
IBM
ManageEngine
Netreo
New Relic
SolarWinds
Splunk
Sumo Logic
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

For this evaluation, we looked at offerings in a binary way, rating vendors (++) if they support that market segment and deployment model and (-) if they do not.

3. Key Criteria Comparison

Building on the findings from the GigaOm report, “Key Criteria for Evaluating APM Solutions,” Tables 2, 3, and 4 summarize how each vendor included in this research performs in the areas we consider differentiating and critical in this sector.

  • Key criteria differentiate solutions based on features and capabilities, outlining the primary criteria to be considered when evaluating an APM solution.
  • Evaluation metrics provide insight into the non-functional requirements that factor into a purchase decision and determine a solution’s impact on an organization.
  • Emerging technologies show how well each vendor takes advantage of technologies that are not yet mainstream but are expected to become more widespread and compelling within the next 12 to 18 months.

The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.

Table 2. Key Criteria Comparison

Key Criteria

Business Transactions Analytics & Reporting Data Masking & Logging Integrations AI/ML OpenTelemetry
Broadcom 2 2 2 3 2 2
Cisco 3 3 2 3 2 3
Datadog 3 3 2 3 2 3
Dynatrace 3 3 3 3 3 3
IBM 3 2 2 2 3 2
ManageEngine 3 2 2 2 2 1
Netreo 2 2 2 2 1 0
New Relic 3 3 3 3 3 3
SolarWinds 3 3 2 2 3 2
Splunk 3 3 3 3 3 3
Sumo Logic 3 3 2 2 2 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 3. Evaluation Metrics Comparison

Evaluation Metrics

Flexibility Scalability Ease of Setup & Implementation Ease of Use Cost
Broadcom 3 3 2 3 2
Cisco 3 3 3 3 2
Datadog 2 3 2 3 3
Dynatrace 3 3 3 3 2
IBM 3 3 2 2 2
ManageEngine 2 2 3 3 3
Netreo 2 2 3 2 3
New Relic 3 3 3 3 3
SolarWinds 3 2 2 3 2
Splunk 3 2 2 3 2
Sumo Logic 2 3 3 3 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 4. Emerging Technologies Comparison

Emerging Tech

Operational Awareness Shadow Changes
Broadcom
Cisco
Datadog
Dynatrace
IBM
ManageEngine
Netreo
New Relic
SolarWinds
Splunk
Sumo Logic
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.

4. GigaOm Radar

This report synthesizes the analysis of key criteria and their impact on evaluation metrics to inform the GigaOm Radar graphic in Figure 2. The resulting chart is a forward-looking perspective on all the vendors in this report, based on their products’ technical capabilities and feature sets.

The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—balancing Maturity versus Innovation and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

Figure 2. GigaOm Radar for APM Solutions

As you can see in the Radar chart in Figure 2, all but one solution fall within the Maturity/Platform Play quadrant. That wasn’t always the case, however. In the past, APM tools were distinct standalone solutions for application performance management only. However, the increasing growth and complexity of modern environments meant APM solutions had to take on additional functions. They are now part of broad platforms that also include capabilities such as infrastructure monitoring, network monitoring, and AI analysis. In some cases, the additional functionality may add to the license fees; in others, there may be data consumption impacts.

Many vendors have transformed their APM solution into an observability solution. The transition from a point APM solution to an observability tool places APM in new territory. There’s a clear progression in the market of customers and vendors moving from monitoring to observability to organizational awareness.

Many Leaders—such as Cisco, Dynatrace, New Relic, Splunk, Datadog, and Sumo Logic—have positioned themselves as observability tools and appear to be making another progression toward organizational awareness. These vendors are bringing additional AI capabilities such as large language models (LLMs) to the table and evolving to include AIOps without letting go of their strong APM capabilities.

All the vendors in this Radar provide solid APM solutions. The details of the environment, its tolerance for change, and the internal working of the company may have more bearing on a choice than the technical differences among solutions.

Overall, APM is a very mature market; vendors have consolidated around one approach to solving organizations’ APM challenges, and there isn’t much disruption or innovation occurring in this space. Vendors are focused on improving existing feature sets rather than investing in new features or approaches.

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

5. Vendor Insights

Broadcom DX Application Performance Management

Broadcom Inc. is a 50-year-old company with roots in AT&T/Bell Labs, Lucent, and Hewlett-Packard/Agilent. Via acquisitions, Broadcom now includes LSI, Broadcom Corporation, Brocade, CA Technologies, and Symantec.

DX Application Performance Management is designed for SMBs, large enterprises, and MSPs. The solution supports multiple deployment options, including the Broadcom SaaS offering, software-only for on-premises, and virtual appliances allowing enterprises to select from a self-managed, SaaS, or hybrid approach.

DX APM consists of three products: App Synthetic Monitoring (ASM), DX Operational Intelligence, and DX Dashboards from Grafana. The platform provides multiple agent deployment methods, including a single zero-touch agent for cloud-native applications. It also includes integration into build pipelines, agent bundle management, and a self-service download UI that lets users select the technologies to be monitored.

DX APM supports ingestion of data from various data sources, including cloud-aware zero-touch agents, language-based agents (Java, .NET, NodeJS), and infrastructure agents. The use of a cloud proxy allows the aggregation of the network connectivity for “walled garden” environments to report data through a secure channel.

The platform traces transactions of suspected issues or abnormal behavior. Power users can set the criteria manually and start a transaction trace session to collect deep data on application transactions. Transaction traces are displayed in the WebUI in a multilayer “upside down wedding cake,” which can represent cross-process and system tracing. Data can also be viewed in a table or tree format or exported as JSON.

DX APM can integrate with Slack via the RestMon API. The platform can remotely collect infrastructure data and JMX (Java Management Extensions) data for systems where an agent can’t be used (F5, routers, and so on).

Free offers of online training from Broadcom Academy are available for customers and partners. Additionally, the Broadcom Learning Portal is available at no charge for customers and offers additional in-depth training. Users can also access documentation for free. New DX APM customers should use professional services to accelerate roll-out and time-to-value for the solution.

Licensing is based on data ingestion. All capabilities come without additional charges and with free access to the entire observability suite. Subscription licensing is based on the monitored entities across the stack, such as an application, a user, cloud instances, storage, or a container. The subscription-based licensing model combined with a consumption metric provides flexibility for a number of use cases.

Broadcom uses AI/ML capabilities to normalize a topology model to automatically discover environmental changes, giving it some skill in operational awareness. The addition of non-technical business information will push it further into the awareness domain.

Strengths: Broadcom’s DX APM is part of a complete platform spanning from mainframes to cloud deployments. This can be a strength for larger organizations because it allows them to use a single vendor for many IT operations functions. DX APM covers most application environments and provides OpenTelemetry support for working with additional data.

Challenges: The breadth of the Broadcom DX platform means it can require professional services to take full advantage of all of the offerings. Not all portions of the solution are provided by Broadcom; DX Dashboards, for example, is from Grafana.

Cisco

Cisco began providing APM with AppDynamics for monitoring hybrid applications and has since expanded to also provide Cloud Native Application Observability for monitoring cloud applications. Cisco has further evolved its APM offerings by integrating AppDynamics, Cloud Native Application Observability and other Cisco offerings around network and security and is now moving toward providing full-stack observability.

Cisco has always provided a good balance between ease of use and technical capability, and this remains true of the larger product suite as well. The solution can map business transactions and provides better-than-average analytics. Integration with external data sources, including OpenTelemetry, is also a standout feature. The solution provides a number of data collector types and can ingest metrics from any source. Flexibility and ease of use on both a daily and long-term basis are also good.

Anomaly detection and root cause analysis are managed using AI/ML modules that can be customized as needed. Anomaly detection, which is not turned on by default, includes event detection, which may encompass any number of logs, traces, and metrics to define the conditions of the event.

Cisco University is available for training and certifications. For complex environments, professional services can assist in design and implementation.

Cisco FSO Platform, applications, and modules are metered based on ingest metrics and number of users but are not limited to these meters as more applications are built on the platform. Cisco AppDynamics licensing is based on the capacity (virtual CPUs) of the unique hosts that are monitored by AppDynamics. Customers can monitor an unlimited number of applications, databases, or containers hosted on those servers. Additionally, Real User Monitoring (RUM) modules are priced based on the number of page views or mobile active agents (tokens are a common unit of measure).

The SaaS application is hosted in the Americas (Oregon and São Paulo), EMEA (Frankfurt, London, and Cape Town), and APAC (Hong Kong, Mumbai, Singapore, and Sydney), thus satisfying national security requirements in the respective countries.

The masking of sensitive data within log files must be done prior to being sent to Cisco. This is especially true for log files where payment card industry (PCI), personally identifiable information (PII), and other confidential data may exist.

Strengths: Cisco is a strong APM player with support for OpenTelemetry, UX monitoring, and application security visibility. Moreover, the full platform provides additional functionality in networking and security.

Challenges: Data masking in logs and other data could be improved. The current method of dealing with these at the source may make compliance more difficult.

Datadog

Founded in 2010, Datadog was one of the first cloud monitoring solutions. The platform includes tools for troubleshooting, monitoring, and data collection in enterprise environments.

Datadog is a cloud-based SaaS APM, observability, and security platform that brings metrics, traces, logs, events, code profiles, security signals, and metadata from across the stack into one unified solution. It includes more than 600 vendor-backed integrations and encompasses capabilities for managing application performance, clouds, and app security. It also does code profiling and monitors infrastructure, networks, databases, and digital experience.

The solution can run on multiple clouds and in multiple regions. It supports a software-only deployment through its RUM, synthetic monitoring, continuous testing, and incident management products. To support on-premises telemetry governance, Datadog acquired Timber Technologies’ OSS product Vector, which allows teams to parse, scrub, enrich, and sample their data from the Datadog UI before it leaves their on-premises environments, regardless of the destination.

Datadog contributes regularly to the open source community. All the agents and libraries running within its customers’ systems are open source, and it participates in projects like statsD, OpenTracing, OpenCensus, OpenMetrics, and OpenTelemetry (to which Datadog contributed its tracing libraries).

Datadog is better than average for business transactions and analytics and reporting, allowing users to customize and schedule reports as needed. Datadog’s Watchdog AI engine uses ML to automatically analyze infrastructure and application performance. Integration with other tools and services is excellent.

The Datadog platform includes a number of products. Customers have the flexibility to license only what they need, which can hold down costs, but knowing which products best serve those needs may take professional services support.

Datadog offers subscription-based licensing for its SaaS products, with annual, month-to-month, and on-demand pricing available. Customers can choose among a high-watermark plan, an hourly plan, or a combination of the two.

Per-host licensing—by which a host is any monitored physical or virtual OS instance, including servers, VMs, nodes (in the case of Kubernetes), or App Service plan instance (in the case of Azure App Service)—is available. For telemetry and logs, licenses are by the volume of data ingested. Custom licenses can be negotiated.

Strengths: Datadog has a strong platform that competes well with other large players in the APM space. The solution is most adept at automatic correlations, analytics (insights), reporting, and third-party integrations. It is a consistent supporter of OpenTelemetry.

Challenges: The cost of Datadog services can creep up unnoticed due the ease of adding additional features. The licensing is flexible, but enterprises will need to pay attention to the number of devices and the amount of data ingested to control costs.

Dynatrace

Dynatrace headquarters are located in Massachusetts, US, with 60 global offices including those in Australia, Brazil, Singapore, Austria, Spain, and the UK. Dynatrace is a strong player in the APM world with market-leading APM metrics, monitoring, reporting, and transaction and tracing capabilities, including RUM and synthetic monitoring.

The Dynatrace platform includes the following components:

  • OneAgent for automated data collection
  • Smartscape for continuously updated topology mapping and visualization
  • PurePath for code-level distributed tracing
  • Davis AI for real-time analytics
  • AutomationEngine for workflow auto

It takes advantage of its Grail data lake house with index-less, schema-on-read storage for contextual data analytics and management, using massively parallel processing and the proprietary Dynatrace query language (DQL). Grail is built to work with Davis, its proprietary AI engine, for automatic root-cause, fault-tree analysis.

Dynatrace can be deployed either as a SaaS solution or as a managed solution for Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP), or on-premises. Both methods use OpenTelemetry or Dynatrace OneAgent to instrument applications, Docker containers, microservices architectures, or cloud-based infrastructure. OneAgent makes deployment and implementation easy. Collecting and sending data from a walled garden with no access to the internet requires a relay on-site, which is only mildly complicated.

Auto-discovered transactional metadata is used to define a service topology and Dynatrace detects and names server-side services of applications automatically. Relationships between microservices and application components are dynamically detected.

Dashboards can be included declaratively in extensions and shipped “as-code” as part of the CI/CD pipeline. This enables easier integration with DevOps tooling, which is a plus. Out-of-the-box ITOM and CI/CD integrations include Bitbucket, Jenkins, Puppet, Chef, Spinnaker, ServiceNow, Ansible, and JMeter. The Davis AI engine enables customers to fully automate service level objective (SLO) validation and quality gates. This ensures high-quality code across the delivery pipeline and protects error budgets in production.

Dynatrace provides GDPR-compliant data privacy using data heuristics to automatically mask information such as IDs or credit card information by default in captured HTTP traffic and PurePath data. This applies to both default captured data and additional captured data that can be configured. Organizations can customize data masking at capture (before it leaves customer environments), at storage (after processing but prior to storage), and at display (presented only to users with appropriate permissions).

Dynatrace provides notifications via email and customizable webhook integrations. The open Dynatrace API and custom extensions allow secure bidirectional integration with other tools and support supplying configuration as code by exposing the entire configuration via a REST API. All configuration is maintained centrally and can be defined via the web UI or with secure, token-based REST APIs.

Dynatrace uses a subscription-based licensing model and provides consumption-based pricing across all modules, using the unit pricing relevant to each module. In the past, Dynatrace offered less flexibility in pricing than many of its competitors. Now, Dynatrace Platform Subscription (DPS) enables annual commit levels on a single contract with flexible access to any platform offering in any combination. DPS requires no monthly commits, and there are no hidden fees, surcharges, or per user charges.

Training and support, including professional services, are robust. Associate and professional certifications are available to support the rather steep learning curve for Dynatrace.

Strengths: Dynatrace continues to be a strong user of and contributor to OpenTelemetry. It scores well on all key criteria. Dynatrace Platform Subscriptions adds needed flexibility to licensing.

Challenges: OneAgent has made Dynatrace easier to deploy and implement, but the learning curve in everyday usage can be steep for those not familiar with the offering.

IBM Cloud Application Performance Management

IBM is an American multinational technology corporation headquartered in New York, with operations in over 175 countries. IBM Cloud Application Performance Management (Cloud APM) helps organizations manage the performance and availability of applications that are deployed on-premises, in a public cloud, or as a hybrid combination.

This is a complete solution that provides full visibility and control over applications and infrastructure, enabling line-of-business owners to manage critical applications and end-UX in production. Application developers can view transaction details and diagnose application problems. Business transactions can be added via integrations, thus strengthening the offering. AI/ML has been a strong part of IBM for many years and continues to be one of the strengths of Cloud APM.

The solution consists of two offerings: Cloud APM (for SaaS deployments) and Private (for on-premises deployments). The offerings and add-ons contain agents and data collectors. Both offerings have a base and an advanced version, with the base version being a subset and offering fewer features and add-ons than the advanced version. More add-ons are available for Cloud APM.

The advanced version of Cloud APM is the most comprehensive offering and includes all agents, data collectors, and dashboard pages. It is for end-UX, transaction tracking, and resource monitoring of all application components. It also provides code-level visibility into applications and the health of application servers. The diagnostics dashboards can be used to find performance bottlenecks in application code and for managing applications. To further test end-UX, synthetic transaction capabilities allow the recording of scripts using the Firefox web browser with the Selenium IDE add-on.

The base version of Cloud APM contains agents and data collectors for monitoring applications, transactions, and other resources installed in the enterprise. Resource monitoring finds and addresses slow transactions, capacity issues, and outages. Users can monitor up to 10,000 managed systems using the base version of Cloud APM. Advanced versions of Cloud APM or Private have higher limits on managed systems.

Agent upgrades are more complicated than expected and depend on the collector. All collectors, except J2EE, must be unconfigured before the data collector can be upgraded.

Strengths: IBM Cloud APM is comprehensive in coverage. Add-ons, such as those for SAP and zSystems, enable unique capabilities. The AI/ML is good, as is the integration of business transactions.

Challenges: IBM Cloud APM is a complicated product for which performing a cost analysis and determining the correct product mix are difficult without pre-sales analysis and professional services for deployment and implementation.

ManageEngine Applications Manager and Site24x7

Zoho Corporation is the umbrella firm for four distinct divisions: ManageEngine, Zoho.com, Qntrl, and Trainer Central.

ManageEngine offers two APM products: Applications Manager is the on-premises version and Site24x7 is the cloud version. While the products have slightly different UIs, they offer similar capabilities, so we review them as one here.

ManageEngine can auto-discover business service components in the IT infrastructure and map dependencies and connections. This can be done using both agent-based and agentless (polling) discovery methods.

The solutions provide a graphical representation of transactions and the subsequent infrastructure components as a service view of the application, infrastructure mapping, traffic details, and the way the application is connected to other dependent resources.

ManageEngine monitors a number of metrics, including query performance, jobs, sessions, thread pools, connection pools, nodes, transactions, and logs. The tools also support app parameters (tags), which help with ingesting custom metrics into the system. The solution also monitors network performance, network traffic, latency, jitter, bandwidth, packets, routers, firewalls, switches, VPNs, and storage.

Transaction tracing, RUM, and synthetic transactions are included. Network monitoring for on-premises networks can be added via the OPManager product, which uses SNMP and APIs where possible.

ManageEngine uses the AI-powered Zia framework for anomaly detection using robust principal component analysis (RPCA) and matrix sketching algorithms to detect any unusual spikes or aberrations in critical performance attributes, such as response time and CPU and memory use. KPIs are compared against seasonal benchmarked values. Anomaly reporting allows fine-tuning of resource performance and infrastructure to avoid problems from unforeseen issues. Reports can be embedded in customizable dashboards and can be shared among users via email, CSV, or PDF.

ManageEngine provides out-of-the-box integration with third-party applications such as OpsGenie, ServiceNow, PagerDuty, and Moogsoft. Although the product uses AI/ML, it can’t use it to process data from outside of the ManageEngine application sphere. OpenTelemetry can be a source for metrics, events, and traces; however, the company does not contribute to OpenTelemetry.

ManageEngine has three licensing options: annual and perpetual for on-premises deployments (Applications Manager) and all-in-one for cloud (Site24x7). Applications Manager’s annual subscription is based on the number of instances of the APM, while perpetual licensing supports distributed monitoring capabilities. The Site24x7 all-in-one cloud monitoring pack is available in starter, pro, classic, elite, and enterprise editions, with cost and functionality increasing as offerings scale. Both products include RUM in all plans, and RUM licensing is based on pageviews. The number of pageviews differs based on the particular plan.

Strengths: A SaaS version is available, with the inherent ease of setup of SaaS applications, and there’s an on-premises version for those who need complete control of their APM solution. ManageEngine has always been strong in SMBs and smaller enterprises.

Challenges: Large enterprises may find the solution lacking in depth and technical range. OpenTelemetry has only tacit support, and ManageEngine does not contribute to OpenTelemetry development.

Netreo Retrace

Founded in 2000, Netreo provides performance and availability monitoring for large enterprise networks, infrastructure, applications, and business services. Retrace is a SaaS APM solution that provides end-to-end application visibility and enhanced troubleshooting at the code level, across hybrid, cloud, and on-premises environments.

The solution supports over 350 integrations, including those for networks, servers, storage, virtualization, and cloud platforms. Similar to other vendors in this space, common integrations include Jira, Slack, Azure Resource API, SSO, Axosoft, Webhooks, Azure DevOps, AWS auto-scaling, and Azure auto-scaling.

Retrace is a good APM tool for developers; it provides detailed code-level application tracing, centralized logging, application error tracking, code profiling, application and server monitoring, and RUM. Synthetic transaction monitoring is also available. While Retrace doesn’t monitor databases, it will monitor the server containing the database.

Implementation requires agent installs on all servers to collect the application data. The collected data is then used to provide the output of the code-level profilers that power Retrace. Depending on the programming language, the profiler may be installed globally on the server, or it may require small modifications to applications or the code itself.

Retrace incorporates integrations for deployment and scaling, including those for Azure and AWS, plus added capabilities for managing instances of Linux (RedHat, CentOS, Ubuntu, Debian), Windows, and Docker. Once the solution is deployed, administrators can set user rights and permissions, and both users and administrators can customize settings for logging, APM, dashboards, alerts, and reports.

The solution can alert on overall KPI metrics for an app, such as its APDEX score and availability, as well as on key transactions and requests. Performance widgets can be added to a custom dashboard to ensure increased visibility for defined endpoints, apps, or environments.

Retrace provides an application score to indicate the condition of an application. Service discovery is not available; however, upstream and downstream infrastructure can be seen. It may suffer at the enterprise level due to the lack of a service view; however, this is on the roadmap for delivery within the next 12 months.

Retrace allows regular expression or regex-based masking for logs and automatic masking of some sensitive information from captured traces. Three roles are included in the out-of-box implementation: account admin, dashboard admin, and billing. Access can be granted on a tiered role-based system whereby users may have more than one role applied to allow or deny access to data. Retrace includes a password vault for database account access.

OpenTelemetry support is currently under development, with the first release expected in October 2023. Netreo does not contribute to OpenTelemetry, but the company is investigating the possibility. Without OpenTelemetry support, Retrace data collection is via proprietary code, which limits customer flexibility and prevents distributed tracing capabilities.

Licensing is via monthly or annual subscription, depending on the tier level. Pricing can be based either on consumption or on host hours (a combination of production host monitoring plus non-production host monitoring, which is billed at one-third the cost of production hosts).

Strengths: Netreo Retrace provides lightweight code-level profiling, centralized logging, and the ability to view logs and exceptions side-by-side, with a compelling licensing cost. The solution allows enterprises to identify and optimize unique exceptions, monitor and improve exception rates, and proactively identify application bugs. It is a good developer-level tool.

Challenges: Retrace doesn’t currently offer OpenTelemetry support, which it indicates will be available in Q4 2023. Retrace provides no direct database monitoring, though it can monitor the server. Service discovery is not available, but it is on the current roadmap for within the next 12 months.

New Relic

New Relic was founded in 2008 to deliver cloud-based application performance management services and went public in 2014. New Relic’s APM solution provides a unified monitoring service for applications and microservices. The product is a SaaS solution and customers can choose among data centers in the US and UK. It is deployed with instrumentation through agents.

The New Relic platform includes more than 30 tools and over 700 integrations to monitor, debug, and troubleshoot the entire application stack. Users can start with one tool and easily add more as their needs change. DevOps users will be attracted to New Relic because of the easy integration and code injection into web browsers.

The APM module stands out with distributed tracing of application interactions, examination of log data within the context of the application under inspection, interactive service maps that are easily navigated both down and up the stack, full stack error tracking that allows examination down to the code level, change tracking, and the management of service levels.

In the area of AI/ML, New Relic has introduced model performance monitoring, a feature that monitors ML models in production and can detect model drift. New Relic supports OpenTelemetry ingestion and is a significant contributor to the OpenTelemetry standard.

New Relic emphasizes cost, scalability, and ease of use. The current cost model has one per-user price for all products: a per-GB price for all data, with users paying only for actual monthly usage. If only APM is required, there are no additional licensing costs for other features, though there will be additional costs associated with data used.

The cloud-based architecture can scale with little enterprise effort other than managing the required agents, which are easy to install and secure. Moreover, there is no dedicated on-premises infrastructure to manage. The inability to see into walled garden environments—secured areas with no access to the internet—can be a problem for SaaS solutions, but New Relic supports the use of an on-site relay system that has access to the secure area.

The full stack nature of the solution allows it to see changes to the code; however, changes that fall in the category of shadow or ghost changes (changes that are not made by normal processes such as CI/CD or change management) are not seen.

Integration with ITSM systems enables approved change requests to integrate with APM data. The introduction of PathPoint provides methods to integrate business data into New Relic to improve operational awareness. New Relic Grok—currently in preview but slated for full delivery in the coming two to four months—may enhance enterprise awareness by providing answers to questions like, “Which application is likely to fail in the next 24 hours?”

Strengths: New Relic presents a strong APM solution embedded within a platform of other tools. Full stack analysis to the code level will be appealing to developers, while other features of the platform may be useful to IT operations in general.

Challenges: As a SaaS-only platform, New Relic may have issues seeing into walled gardens without the addition of on-site relays, which create additional infrastructure management load.

SolarWinds Observability

Founded in 1999, SolarWinds has grown from products that monitor and manage network infrastructure to full-stack monitoring and observability solutions. SolarWinds Observability is a single, unified platform which includes APM as one of its components. It is a cloud-native SaaS offering, and customers can choose between Azure or AWS for deployment. The platform has been built with a common backend, UI, ML engine, alerts, dashboards, topology maps, health scores, and entity structure. Individual components of the platform include:

  • APM (instrumentation for distributed traces and metrics)
  • Infrastructure (cloud and on-premises, Kubernetes, servers, VMs, containers)
  • Logs (scalable, multisource log management )
  • Databases (database performance analysis with root cause diagnostics)
  • Digital Experience Monitoring (synthetic monitoring, availability, reachability, UX, performance insights, RUM, application performance, and browser-level real UX)
  • Networking (network performance and flow, visibility across multiple-vendor on-premises networks)
  • Observability

SolarWinds Observability takes a standard approach, but it does have a few features that set it apart. SolarWinds APM implementation is based on OpenTelemetry libraries with additional enhancements, intuitive UI, and code-level profiling. Additionally, Kubernetes monitoring is available, which may eliminate the need for a separate Kubernetes observability tool.

SolarWinds does well with business transactions, analytics, and reporting, scoring better than average amid a very competitive APM market. The company’s efforts in AI/ML include solid predictability features and good analytics. The solution provides good flexibility in deployment and features and is easy to use and manage.

Overall, the package is strong; however, the transition from observability to awareness is not complete. The integration of business data (more than just business intelligence information) is also not complete. There is no ability to see and alert on shadow changes.

With seven components to the platform, predictable costs may be an issue. There is “a la carte” licensing with a monthly or annual SaaS subscription which, though very flexible, makes it difficult to predict costs prior to starting an implementation without extensive planning. Also, in a rapidly changing environment, costs may be difficult to forecast from month-to-month. Those on a limited budget may find Solarwinds an attractive solution.

SolarWinds Observability is available via subscription-based licensing. Users can purchase any combination of the individual components of the SolarWinds Observability platform (APM, Infrastructure, Logs, and so on) via subscription. Usage of each module is measured in its relevant units (service, DEM check, RUM check, network device, GB/months, instance, database). A number of features are available via subscription, including hosts and containers, which use the number of active infrastructure hosts, with containers counting at a 10:1 ratio to hosts and any non-host cloud service (AWS, Azure) counting at a 3:1 ratio. Log volume and retention time are also based on subscription, using the volume of logs ingested in a calendar month (in GB) with a retention period in days (for example, 3, 7, 15, 21, 30 days). An additional subscription is needed for database observability based on the number of database instances observed. Network devices can also be added via a subscription based on the number of active network devices.

Strengths: SolarWinds Observability provides good flexibility and ease of use.

Challenges: Cost predictability may be an issue due to the same flexibility that is a strength. With seven modules in the package all with different subscriptions metrics, keeping track of costs may prove more challenging than expected.

Splunk Observability Cloud

Splunk was founded in 2003. Splunk Observability Cloud is a SaaS APM solution that collects, indexes, and analyzes real-time data. Data is stored in a searchable repository from which the solution can generate graphs and visualizations, dashboards, reports, and alerts. The solution includes incident response, on-call management, and mobile alerting. By adding Splunk Enterprise, on-premises and SaaS-based IT log aggregation, security, service intelligence, and event analytics are also available.

Together, Splunk Observability Cloud and Splunk Enterprise provide APM, infrastructure monitoring, RUM, synthetics, log analysis, incident response management, and AIOps. All capabilities can be purchased and used independently or together in an integrated offering.

Splunk Observability’s scores for business transactions and reporting are better than average. Integration with other systems and tools is also good, and its full support for OpenTelemetry provides data streams from non-platform sources that use OpenTelemetry, or it can be integrated into the platform.

While there are no glaring weaknesses in the offering, data masking for metrics and traces could be stronger. Also, Splunk’s support for operational awareness via the Splunk IT Service Intelligence part of Splunk’s AIOps solution is a beginning of operational awareness; it does not fully integrate non-technical business data in the solution. This isn’t unusual in this space.

Splunk’s pricing model is flexible, with options to scale as an organization’s needs change. Splunk Observability Cloud can be purchased in host-based bundles, or each capability can be purchased a la carte via host-based or volume-based pricing. Similarly, Splunk Enterprise offers both workload- and volume-based pricing. For those needing to adhere to specific budgets, multiyear pooled capacity subscriptions and enterprise license agreements (ELAs) are available with annual rollover. Splunk also provides software education units that are consumed as either a flat fee subscription service by renting instructors (hourly rates) or by acquiring a set amount of education “credits.” Splunk provides professional services options on-site or remotely, and these services are delivered on a subscription basis on-demand, from its assigned expert service, or via more traditional project-based services offerings.

Strengths: Splunk Observability Cloud is a solid SaaS APM solution, and the addition of Splunk Enterprise allows full integration with on-premises log management. The inclusion of on-call management and mobile alerting are pluses.

Challenges: While Splunk Enterprise is not needed to complete the solution, enterprises that want to use both may need to maintain two separate licenses or subscriptions. Also, there is an added cost to support on-premises portions of Splunk Enterprise for log aggregation, security, service intelligence, and event analytics.

Sumo Logic Observability

Founded in 2010, Sumo Logic is a cloud-native data analytics service that provides log management, monitoring, and analytics to help manage and secure modern applications across cloud and on-premises environments. The Sumo Logic Observability platform is an integrated portfolio of capabilities for monitoring and diagnosing all telemetry ingested. It is both a full stack observability solution and a full stack security solution. The observability tool includes log analytics, infrastructure monitoring, application observability (which includes APM), and end-user monitoring (including RUM). The security tool provides cloud security operations with analytics, security information and event management (SIEM), and security orchestration, automation, and response (SOAR).

Sumo Logic uses collectors to gather data from sources using agents on devices that may reside in a cloud or on-site. Collectors are in the cloud (Saas) or on-site for self-managed implementations. It uses OpenTelemetry agents as well as its own proprietary agents. Sumo Logic is a contributor to OpenTelemetry, and it supports OpenTelemetry for metrics, events, logs, and traces.

Sumo Logic’s application observability is better than average in a number of areas. Business transactions are easy to see, understand, and navigate as are analytics and reporting capabilities. Drill-down—to the code level—is easy to accomplish with good contextual information and suggestions regarding the root cause of the problem under investigation.

As a SaaS solution, setup and implementation are relatively simple, and it scales easily; a hosted agent is all that is required for cloud applications. An installed collector is necessary to gather on-site data, while Sumo Logic hosts the SaaS collector on its end in AWS. An installed collector is on-premises and managed by the enterprise. After installing a collector, users add sources (devices), from which the collector obtains data to send to the Sumo Logic service. Installed collectors run on Windows, macOS, and Linux. These options give Sumo Logic good manageability and configuration abilities.

The cost of implementing Sumo Logic Application Observability is low for either the SaaS or on-premises collection of data. Licensing is by device and by the total amount of data ingested. Multiple editions (tiers) are available, from a free trial up to an enterprise tier, and costs and functionality scale along with the tiers. The higher tiers have customer-defined data retention limits, with a minimum of 365 days available. They also include platform security, which supports compliance requirements such as PCI, SOC, CSA, ISO, and HIPAA.

Sumo Logic addresses operational awareness and provides tools to measure business metrics such as orders per minute and average order value. The solution can ingest meaningful business data and display it with relevant IT metrics, and it can uncover so-called shadow or unknown changes or anomalous behavior in the underlying applications or systems, via schema-on-demand. It also has a release baseline. Using patented LogCompare and TimeCompare analytics, Sumo Logic identifies performance or log definition anomalies introduced by any method. Ephemeral instances are automatically monitored and managed (cloud, Kubernetes, serverless) and can alert operators if patterns change.

There are some downsides to Sumo Logic. It does not provide a self-managed option—only a SaaS version is available. Onboarding, though relatively simple, isn’t as easy as it could be, but Sumo Logic plans within the next 12 months to add UI wizards to improve set-up. Sumo Logic does not support synthetic transactions natively. It relies on integration with Catchpoint, at an additional cost, to execute scripted transactions. However, the test results flow seamlessly to the Sumo back end and are available for analysis together with other APM KPIs on the same dashboards. Similar to all SaaS only solutions, Sumo Logic is responsible for disaster recovery. Business continuity is part of the customer’s planning for outages should they occur.

Strengths: Sumo Logic provides a strong APM solution within its suite of observability tools. Application Observability provides code-level tracing for any stack supported by OpenTelemetry. The support for OpenTelemetry is very good. The enterprise platform offerings include security analytics along with SIEM and SOAR.

Challenges: There is no support for synthetic transactions. The Sumo Logic Observability platform includes much more than just APM, which may involve more tools than an organization actually needs.

6. Analyst’s Take

The entire APM marketplace is maturing, with broad platforms dominating the landscape. Platforms have the potential to displace existing tools and may come with additional political capital costs; however, they provide additional tools outside of APM that are worth examining during the selection process because they can provide significant future benefits.

The most common marketing strategy for APM is via the DevOps toolchain. With platforms offering more than APM, other teams will be interested in solution selection and the resulting expense of political capital can make the selection process more time consuming and possibly more fraught. APM-only solution vendors will also use DevOps as an entry point, but they may not spark as many internal political challenges.

Not all vendors require tools to be implemented separately within their platform. For Dynatrace and New Relic, for example, the license covers the entire observability suite, while data consumption determines the cost of using additional modules. Cisco and BMC also have consumption-based pricing along with subscription licenses.

Enterprises searching for an APM solution should consider whether a platform, which may replace existing tooling, or a standalone solution, which may meet their needs only for the short term, is best for them within the DevOps context or more broadly. In either case, additional modules offered by a vendor will impact decision-making. The cost of licensing, consumption of data, and other factors such as the internal environment and the need to expend political capital should be considered.

7. Methodology

For more information about our research process for Key Criteria and Radar reports, please visit our Methodology.

8. About Ron Williams

Ron Williams is an astute technology leader with more than 30 years’ experience providing innovative solutions for high-growth organizations. He is a highly analytical and accomplished professional who has directed the design and implementation of solutions across diverse sectors. Ron has a proven history of excellence propelling organizational success by establishing and executing strategic initiatives that optimize performance. He has demonstrated expertise in planning and implementing solutions for enterprises and business applications, developing key architectural components, performing risk analysis, and leading all phases of projects from initialization to completion. He has been recognized for promoting effective governance and positive change that improved operational efficiency, revenues, and cost savings. As an elite communicator and design architect, Ron has transformed strategic ideas into reality through close coordination with engineering teams, stakeholders, and C-level executives.

Ron has worked for the US Department of Defense (Star Wars initiative), NASA, Mary Kay Cosmetics, Texas Instruments, Sprint, TopGolf, and American Airlines, and participated in international consulting in Qatar, Brazil, and the U.K. He has led remote software and infrastructure teams in India, China, and Ghana.

Ron is a pioneer in enterprise architecture who improved response and resolution of enterprise-wide problems by deploying “smart” tools and platforms. In his current role as an analyst, Ron provides innovative technology and strategy solutions in both enterprise and SMB settings. He is currently using his expertise to analyze the IT processes of the future with particular interest in how machine learning and artificial intelligence can improve IT operations.

9. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

10. Copyright

© Knowingly, Inc. 2023 "GigaOm Radar for Application Performance Management (APM)" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.