This GigaOm Research Reprint Expires Aug 30, 2024

GigaOm Radar for Service Meshv3.01

Interconnecting the Enterprise

1. Summary

Playing an essential role in cloud-native development, a service mesh enables fast, reliable, and secure communications between microservices. Unlike other systems for managing intra-service communications, a service mesh is a dedicated infrastructure layer fully integrated within the application. Implemented either alongside the workload as a sidecar proxy or integrated directly into the service or platform itself, a service mesh eliminates the complexity, fragmentation, and security vulnerabilities of repeatedly coding service-to-service communications by outsourcing the management of requests to an out-of-process application.

As this is an emerging technology undergoing rapid innovation, choosing a service mesh requires decision makers to carefully evaluate the landscape, considering the additional complexity, latency, and resource consumption involved. With various open-source and commercial vendors targeting a broad range of application environments and deployment options, this GigaOm Radar report provides an overview of the service meshes offered by open-source projects and vendors’ landscape-based table stakes, key criteria, and evaluation metrics. Figure 1 lists the service meshes included in this report and their acquisition options.

 

 

Figure 1. Service Mesh Projects and Vendors

Note: Providing governance for open-source, vendor-neutral cloud-native projects, the Cloud Native Computing Foundation (CNCF) hosts several community-driven open-source projects with varying maturity levels: sandbox (early stage), incubating (stable), or graduated (widely deployed in production environments).

With different service mesh options and a rapidly evolving landscape, choosing the best service mesh for your organization depends on your use cases, existing software stack, architectural choices, and in-house capabilities. In addition, your internal resources and skill sets will influence your decision on whether you adopt a lightweight, developer-friendly service mesh or a full-featured solution requiring professional services.

This GigaOm Radar report highlights key service mesh projects and vendors and provides the information IT decision-makers need to select the best fit for their business and use case requirements. The corresponding GigaOm report “Key Criteria for Evaluating Service Mesh Solutions” describes in more detail the capabilities and metrics used to evaluate vendors in this market.

This is our third year evaluating the service mesh space in the context of our Key Criteria and Radar reports. All solutions included in this Radar report meet the following table stakes—capabilities widely adopted and well implemented in the sector:

  • Dedicated infrastructure layer
  • Service-to-service authentication
  • Control plane configuration
  • Control plane telemetry

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

2. Market Categories and Deployment Types

To better understand the market and vendor positioning (Table 1), we assess how well an open-source or vendor service mesh supports different target market segments and deployment models.

For this report, we recognize the following market segments:

  • Cloud service provider (CSP): Providers delivering on-demand, pay-per-use services to customers over the internet, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).
  • Network service provider (NSP): Service providers selling network services—such as network access and bandwidth—provide entry points to backbone infrastructure or network access points (NAP). In this report, NSPs include data carriers, ISPs, telcos, and wireless providers.
  • Managed service provider (MSP): Service providers delivering managed application, communication, IT infrastructure, network, and security services and support for businesses at either the customer premises or via MSP (hosting) or third-party data centers (colocation).
  • Large enterprise: Enterprises of 1,000 or more employees with dedicated IT teams responsible for planning, building, deploying, and managing their applications, IT infrastructure, networks, and security in either an on-premises data center or a colocation facility.
  • Small-to-medium business (SMB): Small businesses (fewer than 100 employees) and medium-sized businesses (100-1,000 employees) with limited budgets and constrained in-house resources for planning, building, deploying, and managing their applications, IT infrastructure, networks, and security in either an on-premises data center or a colocation facility.

In addition, we recognize the following deployment models:

  • Single or multiple cluster: Service meshes can be configured as either a single cluster or as a single mesh including multiple clusters. A single cluster deployment may offer simplicity, but it lacks features such as fault isolation, failover, and project isolation that are available in a multicluster deployment.
  • Single or multiple network: Workload instances directly connected without using a gateway reside in a single network, enabling the uniform configuration of service consumers across the mesh. A multinetwork approach allows a service mesh to span various network topologies or subnets, providing compliance, isolation, high availability, and scalability.
  • Single or multiple control plane: The control plane configures all communication between workload instances within the mesh. Deploying multiple control planes across clusters, regions, or zones provides configuration isolation, fine-grained control over configuration rollouts, and service-level isolation. Moreover, if one control plane becomes unavailable, the impact of the outage is limited to the workloads managed by that control plane.
  • Single or multiple mesh: While a single mesh can span one or more clusters or networks, service names are unique within the mesh. Since namespaces are used for tenancy, a federated mesh is required to discover services and communicate across mesh boundaries. Additionally, each mesh reveals services that can be consumed by other services, providing line-of-business boundaries and isolation between test and production workloads.

Table 1. Vendor Positioning: Market Segment and Deployment Model

Market Segment

Deployment Model

CSP NSP MSP Large Enterprise SMB Single or Multiple Cluster Single or Multiple Network Single or Multiple Control Plane Single or Multiple Mesh
Amazon
Cilium (CNCF)
F5
Google
greymatter.io
HashiCorp
Istio (CNCF)
Kong
Kuma (CNCF)
Linkerd (CNCF)
Network Service Mesh (CNCF)
Red Hat
Solo.io
Traefik Labs
VMware
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

3. Key Criteria Comparison

Building on the findings from the GigaOm report, “Key Criteria for Evaluating Service Mesh Solutions,” Tables 2, 3, 4, and 5 summarize how well each project or vendor included in this research performs in the areas we consider differentiating and critical in this sector.

  • Key criteria differentiate solutions based on features and capabilities, outlining the primary criteria to be considered when evaluating a service mesh, including built-in resilience, converged security, and AIOps automation.
  • Evaluation metrics provide insight into the non-functional requirements relevant to purchase decisions, reflecting fundamental aspects including configurability, interoperability, and observability.
  • Emerging technologies identify the most compelling and potentially impactful technologies emerging in a product or service sector over the next 12 to 18 months.
  • Specific service mesh capabilities differentiate one service mesh from another based on the specific functionality required to deliver fast, resilient, and secure service-to-service communications.

The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.

Table 2. Key Criteria Comparison

Key Criteria

Platform Support Sidecar Implementation Resource Consumption Low Latency Built-In Resilience Converged Security AIOps Automation
Amazon 1 2 2 3 2 2 1
Cilium (CNCF) 2 1 2 3 1 2 0
F5 2 2 1 1 2 3 1
Google 1 2 2 2 3 2 0
greymatter.io 3 3 2 2 2 3 3
HashiCorp 3 2 3 3 2 3 0
Istio (CNCF) 3 2 2 2 2 2 1
Kong 3 2 2 2 3 3 2
Kuma (CNCF) 3 2 3 2 2 2 0
Linkerd (CNCF) 1 3 3 3 3 3 0
Network Service Mesh (CNCF) 2 2 3 3 2 2 0
Red Hat 1 2 2 2 2 2 0
Solo.io 3 3 2 2 3 3 2
Traefik Labs 2 0 3 2 2 1 1
VMware 3 3 2 2 3 3 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 3. Evaluation Metrics Comparison

Evaluation Metrics

Flexibility Configurability Interoperability Openness Observability Manageability Support Cost
Amazon 2 2 2 1 3 2 3 3
Cilium (CNCF) 2 2 3 2 3 2 3 2
F5 3 1 2 3 3 2 3 2
Google 1 2 1 1 1 1 2 1
greymatter.io 3 3 3 2 3 3 2 3
HashiCorp 3 2 3 2 2 2 3 3
Istio (CNCF) 3 2 2 3 2 1 1 2
Kong 3 3 3 3 3 3 3 2
Kuma (CNCF) 3 3 3 3 3 3 1 3
Linkerd (CNCF) 3 2 2 3 2 3 3 3
Network Service Mesh (CNCF) 2 3 3 3 2 2 1 3
Red Hat 1 2 2 1 3 2 3 3
Solo.io 3 3 3 3 2 3 3 2
Traefik Labs 2 3 2 3 2 3 2 2
VMware 2 3 2 2 3 3 3 2
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 4. Emerging Technologies Comparison

Emerging Tech

Sidecarless Implementation WebAssembly OPA 5G & Edge Networks SMaaS
Amazon
Cilium (CNCF)
F5
Google
greymatter.io
HashiCorp
Istio (CNCF)
Kong
Kuma (CNCF)
Linkerd (CNCF)
Network Service Mesh (CNCF)
Red Hat
Solo.io
Traefik Labs
VMware
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 5. Specific Service Mesh Capabilities Comparison

Specific Service Mesh Capabilities

Service Discovery Advanced Routing Distributed Tracing Encryption Circuit Breaker Fault Injection Load Balancing
Amazon
Cilium (CNCF)
F5
Google
greymatter.io
HashiCorp
Istio (CNCF)
Kong
Kuma (CNCF)
Linkerd (CNCF)
Network Service Mesh (CNCF)
Red Hat
Solo.io
Traefik Labs
VMware
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.

4. GigaOm Radar

This report synthesizes the analysis of key criteria and their impact on evaluation metrics to inform the GigaOm Radar graphic in Figure 1. The resulting chart is a forward-looking perspective on all the vendors in this report based on their products’ technical capabilities and feature sets.

The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—balancing Maturity versus Innovation and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

Figure 2. GigaOm Radar for Service Mesh

It should be noted that Maturity does not exclude Innovation. Instead, it identifies the solution as being proven in a production setting compared to a newer solution undergoing innovation to achieve customer acceptance and adoption. In addition, the length of the arrow (Forward Mover, Fast Mover, or Outperformer) is based on execution against roadmap and vision (according to project or vendor input from last year’s report and in comparison to improvements made across the industry in general).

Furthermore, positioning in the Platform-Play quadrants indicates that the service mesh includes the functionality generally expected from a service mesh and can be deployed on a wide range of platforms even if the project or vendor is focused on a limited set of use cases. In contrast, some service meshes are positioned in the Feature-Play quadrants for the following reasons:

  • The service mesh supports a limited range of platforms (AWS App Mesh, Anthos Service Mesh, Linkerd, OpenShift Service Mesh, and Traefik Service Mesh).
  • The service mesh has a limited set of features (Network Service Mesh).
  • The service mesh includes the functionality generally expected from a service mesh but with an architecture that is new and evolving (Cilium Service Mesh).

As seen in Figure 2, Gloo Mesh, Greymatter, Kong Mesh, Linkerd, and Tanzu Service Mesh are recognized as Outperformers. Both Gloo Mesh and Tanzu Service Mesh continue to be the leading Istio-based service meshes. Gloo Mesh incorporates Istio Ambient Mesh’s sidecarless architecture, built-in best practices for extensibility and security, and simplified, centralized Istio and Envoy lifecycle management, while Tanzu Service Mesh is rapidly evolving as a core component of VMware’s cloud-native microservices strategy. Pushing the boundaries through continuous innovation, Greymatter offers exceptional Layer 3, 4, and 7 visibility, unmatched intelligence, built-in support for emerging use cases, and automated performance optimization. A highly portable, cloud-agnostic full-stack platform running everywhere, Kong Mesh offers ease of use and built-in automation capabilities as an alternative to more complex open-source solutions that are difficult to deploy and manage. Last but not least, Linkerd continues to gain rapid adoption because of its being ultralight, ultrafast, and operationally simple to deploy.

One service mesh to keep an eye on is Cilium Service Mesh. Competing with Istio Ambient Mesh, the Cilium project was the first service mesh to offer the flexibility of running a service mesh in either a sidecar model leveraging the Gateway API control plane or a sidecarless model with a choice of control planes for increased efficiencies. While the jury is still out on the benefits and risks of incorporating the eBPF into a service mesh, several projects and vendors are either doing so or have included it on their roadmaps.

Since publishing the 2022 GigaOm Radar for Service Mesh, Istio has slipped from Leader to Challenger due to the Istio-based service meshes leapfrogging the community-based service mesh in terms of innovation. In addition, CNCF’s Open Service Mesh and NGINX Service Mesh have both been removed. NGINX Service Mesh has been defunded and is no longer supported by F5, while the Open Service Mesh project has been archived and its maintainers reassigned to the Istio project.

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

5. Vendor Insights

Amazon: AWS App Mesh

Launched at AWS re:Invent 2018, AWS App Mesh is a fully managed service bringing the benefits of a service mesh to Amazon Web Services (AWS) customers using compute and container services. Providing application-level networking for running applications at scale, AWS App Mesh can be used with microservice containers managed by Amazon Elastic Container Services (ECS), Amazon Elastic Container Service for Kubernetes (EKS), AWS Fargate, Kubernetes on EC2, and services running on Amazon Elastic Compute Cloud (EC2). App Mesh also integrates with AWS Outposts for applications running on-premises, and it uses a customized version of the open-source Envoy proxy, making it compatible with a wide range of AWS partners and open-source tools.

Supporting both containers and VMs, AWS App Mesh creates an abstraction layer based on virtualized nodes, routers, routes, and services. The App Mesh control plane is designed to support AWS compute services, and the Envoy proxy is customized to support the control plane. Users include the proxy as part of each microservice’s task or pod definition and configure the service’s application container to communicate directly with the proxy. Agent for Envoy monitors Envoy proxies and helps keep them healthy, making applications more resilient to failures. In addition, App Mesh provides an API (which can be accessed via AWS PrivateLink to avoid exposing data to the public internet) to configure traffic routes and other controls among mesh-enabled microservices, allowing users to route traffic based on path or weights to specific service versions.

Customers can leverage App Mesh by adding the Envoy proxy image to the task definition (Amazon ECS and AWS Fargate) using the open-source AWS App Mesh controller, either mutating the webhook admission controller (EKS) or running the Envoy proxy as a container or process on an EC2 instance and redirecting network traffic through the proxy. When each service starts, the proxy automatically connects with the control plane and is configured by App Mesh. Once configured, App Mesh manages the Envoy configuration to provide service mesh capabilities, automatically load balancing traffic from all clients in the mesh and exporting metrics, logs, and traces to the endpoints specified in the Envoy bootstrap configuration.

App Mesh uses mutual transport layer security (mTLS) for service-to-service transport layer authentication, allowing customers to extend the security perimeter by provisioning certificates from AWS Certificate Manager Private Certificate Authority or a customer-managed CA, enforcing automatic authentication for client applications connecting to services. In addition, the telemetry generated by AWS App Mesh—such as error rates and connections per second—can be exported to Amazon CloudWatch and AWS X-Ray or streamed to third-party monitoring services, including Flagger, Grafana, Jaeger, Prometheus, and Splunk, as well as open-tracing solutions like LightStep and Zipkin.

Strengths: A highly available Kubernetes (K8s)-pluggable managed service, AWS App Mesh is fully integrated into the AWS landscape, making it easy for customers to monitor and manage communications for microservices without the need to install or manage additional application-level infrastructure. The extensive AWS ecosystem, installed base, and market position will drive the adoption and development of AWS App Mesh. AWS App Mesh is free for AWS customers.

Challenges: As a managed service, AWS App Mesh is limited to support for applications running on AWS and cannot be migrated to other environments. App Mesh is also proprietary, uses a customized version of Envoy, does not support the Service Mesh Interface (SMI), and can be more complex to set up than other K8s-native service meshes.

Cilium Service Mesh (CNCF Project)

Launched in July 2022, Cilium Service Mesh extends Cilium’s networking, security, and observability capabilities to the application protocol level and is the first service mesh to offer the flexibility of running either a sidecar model leveraging the Istio control plane or a sidecarless model with a choice of control planes. Created by Isovalent and donated to the CNCF as an incubation project in October 2021, Cilium is an open-source plug-in providing networking, observability, and security for bare metal servers, K8s clusters, and other container orchestration platforms and VMs. As a container network interface (CNI), Cilium uses eBPF to dynamically insert powerful control logic into the Linux kernel, enabling Cilium security policies to be applied and updated without requiring any changes to the application code or container configuration.

Cilium Service Mesh allows enterprises to choose between an Envoy-based sidecar model and an Envoy plus eBPF-based sidecarless model. While a typical proxy-based service mesh decouples numerous functions—including service discovery, transport layer security (TLS), retries, and load balancing—from the application code and puts them in a sidecar, Cilium Service Mesh takes decoupling one step further. Combining the Layer 7 policies, observability, and traffic management capabilities of the Envoy proxy with the kernel-level eBPF technology capabilities for Layer 4 and below network traffic, Cilium allows those same functions to be run per node rather than per pod.

The sidecarless model supports the Gateway API and Secure Production Identity Framework for Everyone (SPIFFE) as control plane options. In addition, all Envoy data plane functionality is available via a Kubernetes custom resource definition (CRD). The integrated Ingress controller, which leverages Envoy and eBPF, can be applied to traffic entering a K8s cluster and across clusters for rich Layer 7-aware load-balancing and traffic management, including path-based routing, TLS termination, and sharing a single load-balancer IP for multiple services. Future releases will support additional service mesh control planes, starting with SMI and the K8s Gateway API’s GAMMA initiative for service mesh use cases.

The sidecarless approach promises reduced complexity, lower latency, and more efficient resource consumption due to sidecar start-up and shut-down performance and contention. Since many packets don’t need to be routed through the proxy to access Layer 7 information, performance is increased by passing them straight through eBPF to the network interface, reducing latency and accelerating pod start-up. A high-performance Layer 7 HTTP parser with OpenTelemetry support provides tracing at a fraction of the cost of a proxy-based solution.

A feature for power users, Cilium Service Mesh includes CiliumEnvoyConfig (CEC), a low-level abstraction for programming Envoy proxies directly with a new K8s custom resource definition (CRD) for advanced Layer 7 use cases to make the full Envoy feature set available to users. Simplifying the integration of additional service mesh control planes, CEC allows the Cilium Ingress Controller to specify Envoy listeners and other resources, making it possible to transparently redirect traffic destined to specific K8s services to these Envoy listeners. Moreover, Cilium ClusterMesh allows services running across multiple clusters to be grouped as a single global service, providing the ability to see security events across multiple clusters.

Strengths: Cilium Service Mesh allows users to run a service mesh with or without sidecars based on availability, resource management, and security considerations. A choice of control planes offers a balance between simplicity (Kubernetes Ingress and Gateway API) and power (Envoy and Istio). Isovalent offers enterprise support and various enhancements, including advanced network observability and a highly available DNS proxy. Cilium Service Mesh is available as a free download from GitHub or as an enterprise version with support from Isovalent.

Challenges: While deploying a service mesh and offloading work to eBPF when it makes sense is understandable, decoupling the proxy from the application in the sidecarless model introduces an additional layer of operational and security complexity and unpredictability. The performance benefits of Cilium’s simpler, low latency, and efficient sidecarless option may be offset when users choose to run resource-intensive Istio on top of Cilium for sidecar-based Layer 7 use cases.

F5: F5 Aspen Mesh

A startup incubated within F5 (previously F5 Networks), F5 Aspen Mesh was released in December 2017 as a fully supported carrier-grade, production-ready, and security-hardened Istio-based service mesh distribution built to handle complex mature K8s infrastructures. Supporting mobile service providers requiring dual-stack IPv4/IPv6 ingress and egress for control, data, and signaling, F5 Aspen Mesh incorporates multicloud zero-trust security, compliance policy enforcement, protocol-level observability, and SRE-based application optimization. Leveraging F5’s global infrastructure, F5 Aspen Mesh offers 24/7 white glove and concierge support for production environments, with follow-the-sun options and on-demand, native-speaking support engineers.

F5 Aspen Mesh reduces the complexity of Istio through lifecycle management, long-term support (LTS) releases, and additional services, adding advanced features to the open-source distribution. These include simplified mTLS management, fine-grained role-based access control (RBAC), Istio Vet (for discovering incompatible user application and Istio component configurations in a K8s cluster), and single sign-on (SSO). In addition, objective-driven, AI/ML-powered insight recognition policy frameworks allow users to specify, measure, and enforce business goals.

A cloud-native dashboard offers an intuitive user experience, simplifying day-to-day operations and making it possible to securely run thousands of containers with standardized deployment, scaling, security policy enforcement, and issue resolution. Supporting a distributed and highly scalable, data-driven infrastructure, F5 Aspen Mesh’s observability framework, Rapid Resolve, uses robust data analytics and ML designed to deliver actionable insights in real time and reduce the mean-time-to-resolution (MTTR) with advanced troubleshooting and environment reporting capabilities. F5 Aspen Mesh’s Packet Inspector also provides protocol-level observability with telemetry data delivered in standardized formats for the telecom industry.

A joint solution with F5, BIG-IP Next Service Proxy for Kubernetes (BIG-IP Next SPK) brings critical network capabilities to a K8s environment, meeting the demands of a service provider network. BIG-IP Next SPK supports ingress/egress control for 4G and 5G signaling while streamlining transitions to both standalone (5G-SA) and non-standalone 5G (5G-NSA) while leveraging investments in 4G. The solution offers authentication, encryption, observability, security, policy management, and packet capture of east/west traffic within each 5G core K8s cluster. At the same time, a per-service secure proxy and firewall protect north/south traffic flowing into and out of containerized 5G services. In addition, F5 Aspen Mesh has added customized Istio capabilities, including Elliptic Curve Cryptography and advanced certificate management.

One of the primary contributors to the Istio and Envoy communities, F5 was the first non-founding vendor to release and manage a version of Istio. The company defunded its original proprietary service mesh, NGINX Service Mesh, in favor of F5 Aspen Mesh based on open-source Istio and Envoy. Pricing is based on an OpEx per-node subscription model with optional paid services.

Strengths: A leader in the service provider market, F5 Aspen Mesh is the only Istio-based solution deployable as part of a 5G-SA or 5G-NSA core supporting the migration of 4G virtualized network functions (VNF) to the containerized network functions (CNFs) of 5G’s service-based architecture (SBA). BIG-IP Next SPK brings critical carrier-grade capabilities to a Kubernetes environment, enabling network service providers to create a bridge from their existing 4G networks to a cloud-native 5G core network.

Challenges: With several vendors providing enterprise-grade support for Istio, F5 Aspen Mesh must find ways to differentiate itself for enterprise customers. At the same time, however, F5 Aspen Mesh can often be accessed through service providers delivering service mesh-delivered infrastructure to enterprises and service mesh-derived services to consumers on mobile platforms. Moreover, while BIG-IP Next SPK is a crucial differentiator for NSPs, F5 Aspen Mesh should simplify the Istio experience for mature enterprises with complex infrastructures and develop actionable, machine-assisted insights to help address customers’ challenges.

Google: Anthos Service Mesh

Announced in September 2019, Anthos Service Mesh (ASM) is a limited Anthos-tested Istio distribution enabling customers to deploy a fully supported service mesh on-premises using Google Kubernetes Engine (GKE) On-Prem, on Google Cloud, or as a hybrid solution. Enforcing authentication via mTLS, ASM leverages Istio APIs and core components to deliver agility, observability, and security for services deployed to Anthos GKE or to hybrid cloud and on-premises deployments with container- and VM-based services.

Replacing Istio on GKE, Google offers Anthos Service Mesh as an on-premises, in-cluster control plane, a fully managed service mesh, or a hybrid service mesh spanning both Google Cloud and on-premises deployments. Catering to the needs of existing VMware customers with familiar management and operating environments, the on-premises version uses GKE On-Prem running on top of VMware vSphere on customer hardware.

Comprising Traffic Director, Managed CA, and Google Cloud’s operations tooling, the fully managed version of ASM provides an optionally managed data plane and a Google-managed control plane operating outside of Anthos GKE clusters, reducing the management overhead while ensuring the highest possible availability. Minimizing manual user maintenance, Google manages the control plane’s availability, scalability, and security, including software patching and upgrades. Using the Google-managed control plane simplifies multicluster mesh configuration and reduces the Kubernetes Engine privileges needed to install Anthos Service Mesh.

The Google-managed data plane is enabled by simply adding an annotation to the namespaces, which installs an in-cluster controller to manage the sidecar proxies. The data plane is deployed as a set of distributed proxies that mediate all inbound and outbound network traffic between individual services. The proxies are configured using a centralized control plane and an open API, enabling the automation of everyday networking tasks, including implementing traffic splitting or steering between services and enabling service-to-service authentication and encryption.

While fully managed ASM reduces the need for in-house resources and increases availability and stability, it has numerous limitations, including no support for custom Envoy filters, IPv6, TCP in-proxy cloud monitoring, whitebox sidecars, or multinetwork environments. Environments external to the Google Cloud—including Anthos on-premises, Anthos on other public clouds, Amazon EKS, Microsoft AKS, and other Kubernetes (K8s) clusters—are not supported. Tracing is limited to Google Cloud Trace, with Jaeger and Zipkin tracing available only as a customer-managed option. Additionally, all GKE clusters must be contained in a single region with a limit of 1,000 services and 5,000 workloads per cluster.

Strengths: Fully managed Anthos Service Mesh delivers basic service mesh capabilities for existing Google Anthos customers. Leveraging GKE On-Prem, the on-premises version caters mainly to existing VMware customers looking for familiar management and operating environments. Anthos Service Mesh is included with Anthos subscriptions and cluster and client-based pricing is available for standalone deployments.

Challenges: Tying users to the Google ecosystem, Anthos Service Mesh is a light version of Istio with numerous features removed, including support for Istio CA and Istio Operator. In addition, certain UI elements and features in the Google Cloud console are available only to Anthos subscribers. Potential users should carefully evaluate ASM’s limitations before initiating a PoC, especially considering the uncertainty surrounding the future of Anthos given its limited adoption.

greymatter.io: Greymatter

Developed in-house from the ground up and released in February 2019 by greymatter.io, Greymatter is an enterprise-proven application networking platform offering zero-trust security, Layer 3, 4, and 7 visibility, business intelligence, and automated performance optimization. Addressing many of the challenges introduced by a service-based architecture (SBA), Greymatter is built on cloud-native principles and open-source technologies, enabling granular service mesh-enabled observability, analytic heuristics and insights, and automation to optimize traffic throughput across on-premises, multicloud, or hybrid environments.

Bridging the gap between legacy and modern software applications, the platform comprises an internally developed control plane for SBAs and an Envoy-proxy sidecar data plane with extended filters for east/west internal traffic routing. An API gateway controls north/south traffic flows. The Greymatter platform provides developer-friendly, template-driven, declarative app network layer integration with CI/CD delivery pipelines spanning any on-premises and multicloud environments. In addition to providing a comprehensive out-of-the-box cybersecurity mesh architecture, the platform integrates with Open Policy Agent (OPA) for zero-trust, policy-based access control at every point on the mesh and is flexible and open enough to interoperate with other service meshes.

Designed to treat proxy-based service mesh telemetry as a source of business intelligence, Greymatter leverages AI and ML to analyze data, including Layers 3, 4, and 7 network insights, for automated performance optimization and resource control. Powered by recurrent neural autoencoders, the platform’s anomaly-detection capabilities capture minute operational inconsistencies, predict potential issues, and alert users to inconsistencies via an intuitive contextual UI for remedial action.

The Greymatter platform supports a variety of emerging use cases, including cybersecurity and data meshes. A cybersecurity mesh is a foundational layer enabling discreet security services to work together seamlessly, creating a dynamic security environment based on a zero-trust architecture. Enabling federated data ownership and distributed governance, a data mesh facilitates rapid zero-trust secure sharing of sensitive data objects and capabilities, including policy-based data provenance and lineage tracking. The platform works with third parties in both cases to enable intelligent network decision-making for enhanced cybersecurity and data protection.

Greymatter is designed to be platform agnostic and fluent in many languages. The platform wraps existing IT investments in a ubiquitous Layers 3, 4, and 7 network, securely connecting existing operations and business support system (OSS/BSS) layers to cloud-native technologies. Capable of operating in any public, private, hybrid, multicloud, or container orchestration platform, the platform comes with built-in support for K8s, AWS EKS, AKS, OpenShift OCP, OKD, Konvoy, and bare metal. It is also container agnostic, supporting Docker, CoreOS, K8s, OpenShift, Rancher, and other containers—or no containers. The platform also supports seamless integration with enterprise observability frameworks, including DataDog, Elasticsearch, Grafana, Jaeger, LightStep, Splunk, and Zipkin.

Delivering a comprehensive audit-compliance engine and SPIFFE/SPIRE identity authorization out of the box, Greymatter provides service audit compliance reporting without special instrumentation. Real-time audit taps at Layers 3, 4, and 7 provide a single source of truth for every user and action on the mesh throughout the lifespan of each object. Supporting qualified customers in both privately hosted and public clouds, greymatter.io offers both SMaaS and fully managed services. Subscription pricing is environment-based.

Strengths: In addition to providing a robust, enterprise-ready, container-agnostic, multiple-environment platform, Greymatter’s heuristics-based AI health sub-system offers insights into the overall well-being of the network with the ability to conduct root-cause analysis and discover new operational knowledge about how the network is being used. In addition, out-of-the-box GitOps infrastructure-as-code (IaC) capabilities enable the seamless and consistent application of service fixes and release upgrades while reducing operational risks such as workload configuration drift.

Challenges: Greymatter.io is in a growth phase as it shifts from a small, bootstrapped company predominantly focused on US government and Department of Defense clients to a venture capital corporation supporting global customers spanning a variety of industry sectors. Moreover, the company must expand support for other application networking modules to include the orchestration of cloud provider IAMs, cloud provider API gateways, and cloud provider ingress controllers.

HashiCorp: HashiCorp Consul

Developed internally from the ground up and released as a service mesh in October 2018, HashiCorp Consul provides consistent discovery capabilities and secure service-to-service communication across any environment. As the primary maintainer, HashiCorp offers an open-source version of Consul and an enterprise version with additional functionality and support. HCP Consul is a fully managed service mesh as a service running on the HashiCorp Cloud Platform (HCP), offering push-button and self-service deployments.

HashiCorp Consul provides a full-featured control plane with service discovery, configuration, dynamic load balancing, and segmentation functionality, allowing each feature to be used independently as needed. Closing the gap between applications and networking, Consul provides a step-by-step approach, allowing organizations to deploy service discovery and service registry before building out the service mesh implementation. It also offers networking infrastructure automation for dynamic IP environments. The platform works out of the box with a simple built-in Layer 4 proxy and supports third-party proxy integrations, including Envoy. Built on the Kubernetes Gateway API, the Consul API Gateway determines how clients interact with Consul service mesh applications. Unlike many other service meshes, Consul can run on bare metal or in a pure K8s environment, hybrid K8s and VMs, or a VM-only environment without requiring K8s.

Offered as either a self-managed or managed solution—providing flexibility for enterprises of all sizes—HashiCorp Consul provides discovery and secure connectivity for any application running on any infrastructure or runtime. Consul enforces mutual authentication between services using ACLs, mTLS, and CA distribution, provides multitenancy capabilities, and supports granular traffic management rules based on service identity and request attributes. Additionally, Consul integrates with HashiCorp Vault, which includes using Vault’s CA to generate, store, and auto-rotate TLS certificates for both the HashiCorp Consul control and the data plane.

HashiCorp Consul also offers progressive delivery capabilities—supporting canary deployments, Layers 4 and 7 traffic management, and advanced observability—for containers, VMs, and bare-metal environments. While not a typical service mesh feature, Consul can also automate Layer 3 networking tasks, including dynamic firewalling, automated load balancing, and endpoint visibility. HashiCorp Consul integrates with Terraform for automating networking tasks via a daemon called Consul-Terraform-Sync (CTS). As services scale or new services become available on the network, CTS will automatically update network load balancers and firewalls, enabling new services to be seamlessly discovered and consumed.

HashiCorp Consul provides a consistent view of all services on the network—including non-mesh workloads and irrespective of different programming languages and frameworks—for real-time services like health and location monitoring. Consul captures service-level data and presents it to users via a built-in UI or through integrations with third-party application tracing solutions, including Jaeger, OpenTelemetry, and Zipkin.

An extensible, multiplatform solution with flexible procurement options, HashiCorp Consul supports both on-premises (virtualized and bare metal) and cloud deployments, as well as multiple runtimes, including Amazon ECS, AWS Lambda, HashiCorp Nomad, K8s distributions, and VMs. It also offers native capabilities and integrations for proxies (including Envoy, HAProxy, and NGINX), ingress solutions (including Ambassador and Nginx), and application performance monitoring (APM) solutions such as AppDynamics, Datadog, Dynatrace, Grafana, Prometheus, and Splunk.

Strengths: HashiCorp Consul is a simple, flexible service mesh offering multicluster support and integrations with external non-service-mesh workloads. Unlike many other service meshes, Consul can run in a VM-only environment without requiring K8s. HashiCorp Consul is tightly integrated with HashiCorp’s portfolio, and the SMaaS offering, HCP Consul, is an attractive option for HashiCorp customers looking for push-button and self-service deployments. HashiCorp Consul is available as free downloadable pay-as-you-go SaaS, or via consumption-based pricing.

Challenges: With only a small open-source community providing support for non-HashiCorp users, HashiCorp Consul’s primary value is for existing HashiCorp users wishing to incorporate K8s into their HashiCorp stack. Consul’s ecosystem is limited compared to its competitors, lacking support for K8s integrations such as Flagger. HashiCorp is currently developing out-of-the-box observability capabilities and simplifying its current model for federating HashiCorp Consul data centers to eliminate customer complexity.

Istio (CNCF Project)

Released in May 2017, Istio was co-founded by Google, IBM, Lyft, and other key contributors. However, following concerns expressed by IBM, Oracle, and the open-source and cloud-native communities over the project’s governance and Google’s decision to donate the trademark to the Open Usage Commons (OUC), Istio was accepted by the CNCF as an incubating project in September 2022. The move replaces Google’s control over trademark and licensing with a neutral entity and the potential for broader adoption. The transition unites Istio with Envoy and K8s under a single umbrella and common governance.

One of the more mature and complex service meshes available, Istio offers a rich feature set based on the Envoy Proxy, including dynamic service discovery, service-to-service authentication, load balancing, monitoring, policy creation, and traffic routing. Designed for extensibility, Istio offers a robust, unified K8s-based control plane for managing K8s (in either public or on-premises clouds), VM, and bare metal data planes, supporting a diverse range of deployment needs.

The Istio project also offers Istio Ambient Mesh, a layered, sidecarless architecture offering seamless interoperability with the Istio sidecar-centric data plane. Though it’s not yet production ready, Istio Ambient Mesh allows users to mix and match sidecar and sidecarless capabilities based on the specific needs of each application. A per-node lightweight zTunnel proxy manages Layer 4 features—such as Layer 4 observability, pod-level mTLS encryption, and service-level policies—while an optional, per-service-account waypoint Envoy proxy provides Layer 7 application-level policy, observability, and traffic management.

Istio has strong out-of-the-box identity-based authentication, authorization, and encryption capabilities, with service communications secured by default for consistent policy enforcement. Istio also offers fine-grained control of traffic behavior supporting A/B testing, canary rollouts, and staged rollouts with percentage-based traffic splits. It also provides out-of-the-box failure recovery features with advanced routing policies and management, including circuit breakers, failovers, fault injection, health checks, retries, and staged rollouts. Moreover, its configuration API and policy layer support access, quota, and rate controls, while detailed logs, metrics, and traces provide in-depth observability throughout the cluster with integrated, preconfigured Grafana and Prometheus dashboards for observability.

However, as new features and functions are added, Istio has become notoriously tricky to install, configure, and manage. Istio is addressing this complexity by abandoning its microservices architecture in favor of a monolithic approach, merging multiple, previously separate functions to simplify the service mesh and minimize the tradeoffs. While retaining its microservices approach with strict boundaries between the code and what were formerly independent services, Istio’s functions are presented to the cluster administrator as a single process. It should be noted, however, that Istio does not have a built-in dashboard; a third-party solution, Kiali, has been designed as an add-on for managing, visualizing, validating, and troubleshooting Istio.

While this approach may be good from an engineering perspective, Istio’s quarterly release cycle may impact operational stability. Moreover, Istio’s complexity has resulted in a growing ecosystem with several vendors—including F5 (F5 Aspen Mesh), Google (Anthos Service Mesh), Red Hat (OpenShift Service Mesh), Solo (Gloo Mesh), Tetrate (Tetrate Service Bridge), and VMware (Tanzu Service Mesh)—emerging to provide Istio-based service meshes underpinned by enterprise-grade services and support. Istio is also offered as a managed add-on for IBM Cloud.

Strengths: Due to the marketing efforts of Google and IBM, “Istio” is often used interchangeably with “service mesh,” positioning it as the go-to solution for adding observability, security, and traffic management to the cloud-native stack. Istio is a free download with open-source community support. Istio is also offered as a managed service by F5, Google, IBM, and Red Hat for various environments. Tetrate provides a complete portfolio of design, deployment, and management services.

Challenges: Due to its advanced features and complex configuration requirements, Istio is not as user- or developer-friendly as other service meshes. Re-architecting Istio is ongoing, with a centralized, multicluster controller, additional enhancements for supporting VMs, and security and stability improvements included in recent releases. With the Istio project supporting only the three latest releases (N-2), a quarterly release cycle can be overwhelming for teams with limited capacity and skills.

Kong: Kong Mesh

Released for general availability in August 2020, Kong Mesh is a modern, enterprise-ready control plane for service mesh and microservices built on top of Envoy and Kuma, the open-source project authored by Kong and donated to the CNCF. Kong Mesh extends Kuma’s existing advanced feature set by including critical functionality and support for running enterprise workloads. Kong Mesh also provides additional service mesh features and integrations for the Kong Konnect platform, a full-stack connectivity platform delivered as-a-service for multicloud environments.

Once installed, Kong Mesh improves service connectivity via policies that can be added to each mesh, service, or attribute that qualifies a traffic path, accelerating developer efficiency, cost reduction, General Data Protection Regulation (GDPR) compliance, and zero-trust security. Deployed as a turnkey service mesh via a single command across any cloud, Kubernetes cluster, or VM-based infrastructure, Kong Mesh supports both a single control plane deployment (standalone) and a multizone deployment with global/zone control plane separation.

Effectively a single pane of glass for the entire enterprise, the global control plane acts as the primary control plane, onboarding new resources and automatically propagating service mesh policies to zone control planes. The global control plane can be integrated with existing CI/CD workflows via CRDs, an HTTPS API, and Kuma’s CLI. Zone control planes are deployed in their respective zones operating as secondary control planes with read-only access for data plane proxies in the same zone.

The latest release includes integrations with OPA, the open-source, policy-as-code tool for Layer 7 policy support, automatic configuration of Envoy for FIPS 140-2 compliance, and authentication between global and isolated control planes. Furthermore, Kong Mesh automates the distribution of those policies throughout multicluster and multiregion deployments, eliminating the need for manual configuration. It also extends the service mesh and OPA to include legacy infrastructure such as VMs.

Focused on ease of use, Kong Mesh leverages Kuma to deliver a supported, multimesh product that can scale across teams and lines of business while simultaneously providing cross-cluster and cross-cloud connectivity for modern architectures. Accelerating configuration and deployment, Kong Mesh abstracts away the complexity of setting up a service mesh by encapsulating Envoy within its own processes. A native GUI provides quick visual feedback on what is happening in the system.

Supporting both K8s and VM workloads, Kong’s “run anywhere” philosophy allows it to be deployed across any environment—multicluster, multicloud, and multiplatform. Organizations can either use Kong Mesh’s CRDs to natively manage service meshes in K8s or start with a service mesh in VM environments and migrate to K8s at their own pace. In a multizone deployment, Kong Mesh supports multiple environments without increasing complexity. Kong offers per zone pay-as-you-go pricing, allowing customers to scale the number of data plane proxies up or down within the zone without licensing constraints.

Strengths: Kong Mesh’s ease of use and built-in automation capabilities offer an alternative to some complex open-source solutions that are difficult to deploy and manage. Security-conscious enterprises will be attracted by Kong Mesh’s FIPS 140-2 compliance and consistent application of security policies across all environments. Kong’s customer reliability engineering (CRE) team offers 24/7 support using an industry-standard, follow-the-sun model for all Kong products.

Challenges: As a relatively new entrant built on a sandbox CNCF project, Kong Mesh visualization is currently limited to a Grafana plugin. However, Kong is developing an interactive mesh explorer tool as part of its SMaaS offering. In order to compete successfully, Kong must add WASM plugins and serverless support. Furthermore, as a known leader in the API gateway space, Kong’s service mesh focus is subject to customer adoption.

Kuma (CNCF Project)

Created by Kong and donated to CNCF as a sandbox project in June 2020, Kuma is an open-source service mesh using Envoy as the data plane proxy and a control plane developed by Kong. Built to support both greenfield and legacy enterprise applications, Kuma offers scalable, multizone connectivity across multiple clusters and clouds using bare metal, K8s, or VMs with one-click transparent proxying. In addition, Kuma automatically keeps an inventory of all data plane proxy sidecars running across every zone, allowing the service mesh to scale to any number of zones and sidecars.

Unlike other service mesh solutions, Kuma provides native support for both K8s and VMs on both control and data planes, with multimesh support spanning boundaries, including K8s namespaces. Designed for the enterprise architect, Kuma ships with both standalone and advanced multizone and multimesh support, enabling cross-zone communication across different clusters and clouds with its global control plane separation. Flexible traffic routing can be applied to entire zones, individual services, or custom traffic paths using source and destination selectors.

Kuma’s architecture includes control plane separation, with each zone allocated its own horizontally scalable control plane to minimize the possibility of one zone affecting other zones if it goes down. The global control plane also automatically propagates service mesh policies across every zone, including automated handling of failures and reconciliations. While all zones are centrally managed through the unified global control plane, each zone has its own control plane—which can also be scaled horizontally—so that policies can be rapidly applied to the zone’s data plane proxies. Kuma scales linearly and horizontally by adding more control planes, scaling to over 100,000 data planes spanning ten or more zones.

A single pane of glass for the entire enterprise, the global control plane can be integrated with existing CI/CD workflows via CRDs, an HTTP API, or Kuma’s command-line interface (CLI). With an out-of-the-box Layer 4 and Layer 7 policy architecture enabling discovery, decentralized load balancing, automated self-healing, observability, routing, traffic reliability, and zero-trust security, Kuma abstracts everyday use cases and automatically propagates service mesh policies across the infrastructure to support a multimesh, multitenant environment on the same control plane. In addition, out-of-the-box multicloud, multicluster, and multizone support with attribute-based policies provide automatic policy synchronization and connectivity to support custom workload attributes for GDPR and payment card industry (PCI) compliance.

Kuma provides foundational authentication, authorization, encryption, and policy controls spanning environments, containers, and virtual machines. It integrates natively with API gateways to support other authN/authZ schemes when exposing services to other applications, teams, or external parties at the edge. Offering native service discovery, Kuma supports a wide range of containers, operating systems, and cloud infrastructures, each running either its own mesh or a hybrid service mesh running on bare metal, K8s, and VMs, with simplified migration between environments. Easy to use with no Envoy expertise required, Kuma packages Envoy with every installation, automatically injecting the sidecar proxy into workloads for global and remote deployment modes and native integration with API management solutions.

Strengths: While most service meshes prioritize K8s/container-driven applications, Kuma also supports any existing applications running on bare metal, K8s, or VMs. Kuma supports single and multiple clusters through its standalone and multizone deployment options without increasing deployment or management complexity. In addition to being deployed in Fortune 500 companies, Kong estimates Kuma to be the fastest growing of the second wave of service meshes based on GitHub public stars. Kuma is available free of charge as a download on GitHub.

Challenges: While claiming to address the limitations of first-generation service mesh technologies by enabling seamless management of any service on the network, Kuma is a relatively new entrant compared to other cloud-native service meshes such as Consul, Istio, and Linkerd. Kuma’s success will depend mainly on its adoption by the open-source community and its promotion by Kong as the underlying technology of Kong Mesh. As a CNCF sandbox project, Kuma does not provide enterprise support.

Linkerd (CNCF Project)

The original “service mesh” released in 2016, Linkerd is an open-source, CNCF-hosted security-first service mesh providing observability, reliability, and security for K8s applications running on bare metal or in the cloud without adding complexity. As the only CNCF-graduated service mesh, Linkerd offers an ultralight, ultrafast, and operationally simple approach to deploying a service mesh on any existing platform. Targeting every K8s adopter irrespective of the organization’s size, Linkerd installs in minutes, requires zero configuration, and can be added incrementally to an application without disruption. Linkerd also comes with preconfigured, out-of-the-box Grafana and Prometheus dashboards and support for OpenTelemetry.

Adopting a problem-centric approach, Linkerd’s strategy is to solve immediate, concrete problems—in as general a way possible—without attempting to build the ultimate platform addressing all use cases. While other service meshes trend toward adding features supporting multiple use cases but requiring extensive configuration and tuning, Linkerd concentrates on limited use cases to reduce its footprint, automate as much as possible, and minimize the operational burden.

Much of Linkerd’s simplicity can be attributed to its data plane implementation using the internally developed Linkerd2-proxy—a lean, modern, scalable, and high-performance Rust-based network “micro-proxy”—rather than the commonly used Envoy Proxy. Since a fully deployed service mesh can run thousands—or tens of thousands—of micro-proxies, the impact on resource consumption and latency compounds quickly. Using the Linkerd2 proxy allows Linkerd to maximize the speed and security of the data plane while optimizing resource consumption. Benchmarks conducted by Kinvolk GmbH (an open-source engineering and technology company recently acquired by Microsoft) found Linkerd to be significantly faster than open-source Istio while consuming an order of magnitude less data plane memory and CPU.

Leveraging K8s’ security primitives rather than inventing new ones, Linkerd’s security-first approach is designed to improve the overall security of the environment. Zero-trust ready, Linkerd uses mTLS to provide workload identity authentication, confidentiality, and integrity for all communication between meshed pods. Eliminating the security vulnerabilities common to C and C++ projects such as Envoy, Linkerd uses Rust as the data plane programming language, protecting sensitive customer data within a minimalist runtime footprint while retaining native code performance. The simplicity of Linkerd minimizes the risk of misconfiguration or avoidance of security features due to the high cost of adoption.

As the original creator of Linkerd, Buoyant launched Buoyant Cloud, a fully automated and unified service mesh dashboard built to monitor, assess, and validate the health of Linkerd clusters. Tracking data and control plane metrics, Buoyant Cloud identifies data plane inconsistencies, manages mesh lifecycles and versions, and proactively issues alerts. Enterprise support for Linkerd is available from Buoyant and other third-party companies.

Strengths: Designed from the ground up as a lightweight, security-first service mesh supporting mission-critical features for cloud-native applications using K8s, Linkerd is the only service mesh committed to operational simplicity and low resource consumption. Linkerd is deployed long-term in tens of thousands of K8s clusters worldwide, with CNCF predicting faster adoption than other service meshes. Linkerd has an aggressive roadmap, including recently released support for the Gateway API, zero-trust route-based policies, dynamic request routing, circuit breaking, and FIPS-140. Linkerd is available free of charge as a download on GitHub.

Challenges: Linkerd’s focus on limited use cases may restrict its application for particular enterprises and organizations. Moreover, Linkerd’s data plane proxy currently supports only K8s workloads running on bare metal or in the cloud. Support for Linkerd is primarily provided by the open-source community. However, Linkerd’s creator, Buoyant, and other third-party companies offer paid support for enterprise clients.

Network Service Mesh (CNCF Project)

Donated to the CNCF in April 2019, Network Service Mesh (NSM) is a community-driven sandbox project rapidly gaining momentum because of its ability to simplify connectivity among workloads regardless of where they are running. As a hybrid, multicloud IP service mesh, NSM extends IP reachability to workloads running on-premises, in legacy environments, across multiple clusters, and in public clouds, communicating using existing protocols. Furthermore, since individual workloads need connectivity to only a limited selection of other workloads, NSM provides hybrid, multicloud IP connectivity for applications and application service meshes without requiring any changes.

Built from the ground up, NSM shifts IP networking from infrastructure to a selection of network services. By connecting an individual workload—or K8s pod—to a network service via a simple set of APIs, NSM enables the infrastructure to remain immutable while meeting a wide variety of requirements. NSM also allows individual workloads to connect to a network service via a WireGuard vWire injected into the pod as a secondary, non-conflicting interface. Finally, by matching the selection of network services to the granularity of the workload rather than the cluster NSM allows different workloads to consume different, potentially conflicting network services.

As an additional infrastructure layer running on top of out-of-the-box K8s, NSM maps the concept of a service mesh from Layer 7 workloads to Layer 2 and Layer 3 workloads, providing additional connectivity, observability, and security at the network layers. Complementing higher-level application service meshes by treating them as part of a network service, a Consul, Istio, Linkerd, or other service mesh can run as a single instance on top of NSM’s virtual Layer 3 spanning multiple clusters, clouds, or organizations.

NSM loosely couples workloads to relevant network services independently of the underlying environment, enabling individual workloads to join multiple network services simultaneously, with each network service having its own control plane segmented along the logical lines of the service. As a result, the service mesh delivers the operational simplicity of a single cluster solution while allowing workloads running in multiple clusters across multiple clouds to connect via a shared network service, irrespective of location.

When installed on a K8s cluster, NSM simplifies sophisticated network connectivity for the developer. Designed to operate at internet scale, network service endpoints running anywhere can advertise network services in a network service registry domain. In turn, NSM allows any authorized workload—located anywhere—to request a published network service from one or more service registries. No changes are made to either K8s or to the CNI plug-in being used.

In addition to running on bare metal, NSM has been tested with Amazon EKS, GKE, Microsoft Azure Kubernetes Service (AKS), and across public clusters. NSM is managed via a CLI and well-defined gRPC APIs for registering network services and network service endpoints with its registry server. NSM also includes auto-healing capabilities, uses OPA to enforce admissions policies based on SPIFF and SPIRE identities, and integrates with Prometheus and OpenTelemetry for observability. (Note: SPIRE is a production-ready implementation of SPIFFE.)

Strengths: Network Service Mesh is the only service mesh operating on Layer 2 and 3 workloads. Adopted by Cisco, Ericsson, and Intel for next-generation architectures, NSM complements Layer 7 service meshes by providing additional connectivity, observability, and security. In addition, Ericsson is actively contributing to NSM to enable 5G-specific use cases for cloud-native network functions. Network Service Mesh is available free of charge as a download on GitHub.

Challenges: While NSM offers tangible benefits and has attracted significant interest from leading industry players, it lacks widespread adoption. However, with several NSM-based solutions targeted for live deployment, we expect adoption to increase.

Red Hat: OpenShift Service Mesh

Announced in August 2019, Red Hat OpenShift Service Mesh (OSSM) provides a uniform way to connect, manage, and observe microservices-based applications running within the OpenShift Container Platform, a private PaaS developed by Red Hat for enterprises running OpenShift on on-premises or public cloud infrastructure. Based on the open-source Istio project, OSSM provides behavioral insight and operational control of Maistra Service Mesh, an opinionated distribution of Istio designed to work with OpenShift. OpenShift Service Mesh bundles Maistra Service Mesh—incorporating specific Istio features using the Envoy proxy—with Jaeger and Kiali into a platform providing discovery, service-to-service authentication, load balancing, failure recovery, metrics, and monitoring.

Engineered to be production-ready, OSSM increases developer productivity and accelerates application time to value by integrating policy-based service-to-service communications without modifying application code or integrating language-specific libraries. Tested with other Red Hat products, OSSM installs easily on Red Hat OpenShift and comes with enterprise-grade support, simplifying and streamlining management for operations personnel.

OSSM uses Grafana, Jaeger, Kiali, and out-of-the-box security to trace, observe, and secure intra-service communications. An open, composable, and interactive observability and data visualization platform, Grafana enables users to query, visualize, understand, and trigger alerts for metrics regardless of where they are stored. Jaeger, an open-source, end-to-end distributed tracing system, monitors and troubleshoots transactions in complex distributed systems. Optional but installed by default, Jaeger allows users to track a single request as it makes its way among different services—or even inside a service—providing insight into the entire request process from start to finish.

The management console for OSSM, Kiali, another open-source project, is designed specifically for configuring, validating, visualizing, monitoring, and troubleshooting Istio service meshes in near-real time to increase availability and performance. Delivering an intuitive, end-to-end view of all microservices, Kiali displays the structure of the service mesh by inferring traffic topology and using service metrics to indicate application health, reliability, and performance, providing visibility into features such as circuit breakers and request rates. In addition, Kiali integrates with Jaeger to troubleshoot and isolate bottlenecks in end-to-end request paths.

Providing out-of-the-box security for distributed applications, OSSM securely connects services by default using transparent mTLS encryption and enforces a zero-trust network security model with fine-grained traffic policies based on application identities. In addition, the service mesh offers traffic management capabilities to facilitate failovers, canary deployments, traffic mirroring, and A/B testing. Controlling the flow of traffic and API calls between services, OSSM improves service reliability with automatic request retries, timeouts, and circuit breakers, making applications more resilient.

Differing from upstream Istio deployments, OSSM offers features to ease deployment on Red Hat OpenShift and help resolve issues, including the installation of a multitenant control plane, extending RBAC features, replacing BoringSSL—an OpenSSL derivative—with OpenSSL, and enabling Kiali and Jaeger by default. Rather than automatically injecting Envoy sidecars into K8s pods, OSSM requires an annotation, providing more control by allowing users to select the services to be included in the mesh. OSSM recently added support for the Kubernetes Gateway API, an OpenShift Service Mesh Console, and a cluster-wide topology option consistent with upstream Istio’s deployment topology.

Strengths: OSSM provides a uniform way to connect, manage, and observe microservices-based applications running within a Red Hat OpenShift environment. Designed to integrate with other Red Hat products, OSSM installs easily on Red Hat OpenShift and includes enterprise-grade support, simplifying and streamlining management for operations personnel. OSSM is available free of charge for OpenShift users from the Red Hat Ecosystem Catalog.

Challenges: OSSM’s underlying Maistra Service Mesh is a clone of Istio and lags behind parent Istio versions. However, RedHat has started work on a major release with the goal of converging OSSM with community Istio for increased compatibility and faster development. OSSM does not support VMs. In addition, while OSSM does not support control plane canary upgrades, it does use the OpenShift Service Mesh operator for installation and upgrades.

Solo.io: Gloo Mesh

Launched in early 2019, Gloo Mesh is a modern K8s-native control plane enabling the configuration and operational management of multiple heterogeneous service meshes across multiple clusters via a unified API. Designed as a drop-in replacement for any existing Istio environment, Gloo Mesh can be run either in its own cluster or colocated with an existing mesh, enabling global traffic routing, load balancing, access control, and centralized observability of multicluster environments. It discovers meshes and workloads and establishes a federated identity, facilitating the configuration of different service meshes through a single API.

An enhanced version of open-source Istio (as opposed to a fork), Gloo Mesh Enterprise includes an extended version of the Envoy proxy. This capability enables the consistent configuration and orchestration of services across multiple VMs, clusters, clouds, and data centers from a single control point. Focusing on ease of use, Gloo Mesh Enterprise validates upstream Istio software and incorporates built-in best practices for extensibility and security, including role-based APIs.

Gloo Mesh Enterprise includes a FIPS 140-2-ready Istio-based service mesh with automated service and API discovery enforcing zero-trust security with authentication, authorization, and encryption. The Gloo Mesh Gateway offers end-to-end encryption, security, and traffic control, incorporating traffic management into both east/west and north/south data transfer flows. In addition, Gloo Mesh extensions allow customers to extend and customize their API infrastructure with pre-built extensions and tooling for WebAssembly, plug-ins, and operators, extending custom Envoy proxy capabilities. A self-service portal enables developers to catalog, publish, and share APIs in a secure environment.

Gloo Mesh supports Istio Ambient Mesh (co-developed by Solo and Google), a new, sidecarless Istio data plane architecture offering simplified operations, broader application compatibility, and reduced costs. An alternative to Envoy sidecars, Istio Ambient Mesh splits Istio’s functionality into a secure overlay layer and a Layer 7 processing layer, each offering relevant telemetry, traffic management, and zero-trust security capabilities. Fully interoperable with sidecar deployments, Ambient Mesh’s layered approach allows users to adopt Istio incrementally per namespace, transitioning first to a secure overlay before implementing full Layer 7 processing.

Providing a single interface for cloud-native application networking, Gloo Platform is Solo’s complete application networking platform, comprising Gloo Mesh, Gloo Gateway (API-GW), Gloo Network (Cilium CNI), and Gloo Fabric (Multi-Cloud). While the components can be purchased and deployed individually, they are integrated with a single unified control plane, unified management plane, and API for provisioning. The Gloo Mesh API integrates with leading service meshes and abstracts away differences between their disparate APIs, streamlining the configuration, operation, and lifecycle management of multicloud, multimesh, and multitenant environments.

Moreover, an extension to Gloo Platform, Gloo Fabric, allows applications running on containers, VMs, or serverless compute to be brought together in a secure, isolated virtual cloud workspace to accelerate application migrations or manage applications spanning multiple environments. As a decoupled control plane for the Envoy Proxy, Gloo Edge allows customers to iteratively add service-mesh capabilities to their cluster ingress without investing in a full-blown service mesh. Solo.io also offers Gloo GraphQL, the industry’s only implementation of the GraphQL engine embedded in Envoy.

Strengths: Gloo Mesh Enterprise is an Istio-based service mesh and management plane that simplifies and unifies the configuration, operation, and visibility of the service-to-service connectivity within distributed applications. Solo.io offers enhanced distributions of upstream open-source Istio (including FIPS, ARM, LTS) and Envoy Proxy, production support, and simplified, centralized Istio and Envoy lifecycle management for greenfield and brownfield environments. Solo.io is the first vendor to support Istio Ambient Mesh, offering a sidecarless alternative for Gloo Mesh Enterprise. Solo.io offers a cluster-based subscription model. A free open-source distribution is also available.

Challenges: With all the options available, Solo.io’s product portfolio can be confusing and difficult to navigate. Despite being contributors to the Istio and Envoy projects and investing heavily in talent and innovation, Solo.io is still dependent on open-source Envoy and Istio for its core offerings. And while Solo.io offers extended Istio support, forced periodic refreshes have the potential for disruption. However, with Istio’s move to the CNCF, Solo.io has 3 of 6 seats on the Istio Technical Oversight Committee, providing the opportunity to stamp its authority and take the lead in influencing Istio’s direction.

Traefik Labs: Traefik Mesh

Released in September 2019 and known previously as Maesh, Traefik Mesh is a simple, straightforward, and non-invasive service mesh utilizing Traefik Proxy—rather than Envoy—to manage service-to-service communications inside a K8s cluster. Created and maintained primarily by Traefik Labs (previously known as Containous), Traefik Proxy is one of the most used cloud-native application proxies, with over 3 billion downloads and over 44,000 GitHub stars. Traefik Labs claims Traefik Mesh is the simplest and easiest service mesh to deploy for enhanced control, security, and observability across all east/west traffic flows with minimal overhead.

Integrating natively with K8s, Traefik Mesh is a lightweight—yet full-featured—service mesh supporting the latest SMI specification. Traefik Mesh supports a per-node architecture instead of a sidecar proxy for simplicity and resource conservation. Because Traefik Mesh is opt-in by default, existing services are unaffected until explicitly added to the service mesh rather than being automatically injected into the application.

Since Traefik Mesh does not use any sidecar container, routing is handled through proxy endpoints on each node. Leveraging Traefik Mesh endpoints, the sidecarless architecture means that Traefik Mesh does not modify K8s objects or traffic without the user’s knowledge. Supporting multiple configuration options, including annotations on user service objects and SMI objects, the mesh controller runs in a dedicated pod and handles all the configuration parsing and deployment to the proxy nodes.

Designed for simplicity with a focus on efficiency and low-resource utilization, Traefik Mesh is easy to install and configure via a CLI. Its feature set includes traffic management capabilities, such as circuit breakers, load balancing, retries and failovers, and rate limiting. In addition, Traefik Mesh provides observability with out-of-the-box metrics preinstalled with Grafana and Prometheus and is compatible with Datadog, InfluxData, and StatsD. Tracing is supplied through OpenTelemetry, delivering full compatibility with Haystack, Instana, Jaeger, and Zipkin for resilient, scalable tracing and analysis.

In addition to basic security in the form of mTLS, Traefik Mesh is SMI-compliant and facilitates fine-tuning traffic permissions via access control. A specification for service meshes running on K8s, SMI defines a common standard for service mesh providers, covering the most common capabilities and enabling flexibility and interoperability. Furthermore, since SMI is specified as a collection of K8s APIs, users who know K8s can use Traefik Mesh.

Built on open-source Traefik Proxy and Traefik Mesh, Traefik Enterprise consolidates API gateway, ingress control, and service mesh within one simple control plane. A unified, cloud-native connectivity solution, Traefik Enterprise simplifies microservices networking complexity with distributed, highly available, and scalable features combined with premium, subscription-based bundled support for enterprise-grade deployments. In addition, Traefik Enterprise includes an enhanced dashboard with service mesh observability of internal east/west traffic.

Strengths: Traefik Mesh comprises a selection of features to achieve good usability and performance, including circuit breakers, load balancing, rate limiting, retries and failovers, and security, as well as observability and out-of-the-box metrics. Using the popular Traefik Proxy, Traefik Mesh offers lightweight, SMI-compliant, non-invasive traffic management with good usability and performance. Instead of a sidecar proxy, Traefik Mesh uses an opt-in, per-node proxy connecting services, providing increased control while minimizing resource consumption. Traefik Mesh includes open-source community support or subscription-based enterprise support from Traefik Labs.

Challenges: Traefik Mesh lacks multicluster capabilities, so users requiring a unified control plane spanning clusters, clouds, and meshes should look elsewhere. While Traefik Mesh supports the SMI access control, it doesn’t offer transparent, end-to-end encryption, and it does not support VMs.

VMware: Tanzu Service Mesh

Announced in December 2018 as NSX Service Mesh and launched as Tanzu Service Mesh (TSM) in March 2020, TSM is an Istio-based, enterprise-class service mesh providing consistent connectivity and security for microservices across multicluster and multicloud K8s environments. TSM integrates with Tanzu Mission Control (TMC) in a loosely-coupled model to provide standard service mesh capabilities via the Istio API. TSM also supports Tanzu Kubernetes Grid and VMware’s K8s platform, in addition to AKS, EKS, GKE, OpenShift, and other K8s distributions to create a cross-platform service mesh. TSM layers unique end-to-end use case support and integrated solutions that are challenging to achieve with service mesh technologies alone. Operated as a SaaS, TSM’s global controller is a fully managed solution operated and maintained by VMware.

Tanzu Service Mesh includes the TSM Global Controller—a control plane provided as a SaaS managed by VMware—and the TSM Data Plane running across customers’ K8s clusters. Based on open-source Istio and Envoy, the TSM Data Plane delivers typical services such as authentication and authorization, circuit breaking, rate limiting, timeouts and retries, traffic shifting, and other features. The TSM Data Plane also includes the TSM Agent, which provides a secure connection between the customers’ clusters and the TSM Global Controller for managing the configuration and policies enforced in the TSM Data Plane.

Tanzu Service Mesh includes a unique application abstraction layer called Global Namespace (GNS), which acts as a logical grouping for microservices. Managed through a declarative API model and an intuitive UI, GNSs provide modern applications with simplified configurability, API-driven automation, isolation, and operational consistency for DevOps and security, irrespective of the underlying platform or cloud. They also provide automated service discovery and naming (DNS), resiliency policies, security policies, service graphs, and traffic routing.

Enabling full automation of multicluster configurations (in a federated model where each cluster’s control plane is independent and cross-cluster traffic is restricted to the data plane), ingress and egress configurations, and seamless cross-cloud application portability, GNS supports microservices within a single cluster and microservices distributed across multiple clusters and clouds. Integration with the Tanzu Application Platform (TAP) provides an enhanced developer experience, enabling connectivity, resiliency, and security intent to be pre-configured into a GNS and then automatically deployed to TAP applications.

TSM offers complete lifecycle management of the service mesh with automated cluster onboarding during the Istio installation; one-click operations to upgrade, patch, roll back, or remove the TSM Data Plane from clusters on any K8s platform or cloud environment; and automated data plane health checks and management to minimize configuration drift. TSM operates either standalone or as part of a fully-integrated lifecycle management workflow managed by TMC. In addition, TSM works with VMware’s NSX Advanced Load Balancer (formerly Avi Networks) to provide multicloud support, unified policies, load balancing, ingress, container networking, and observability across VMware and third-party K8s environments.

Contextual API security (based on VMware’s March 2021 acquisition of Mesh7) allows developers and security teams to better understand when, where, and how applications and microservices are communicating via APIs—even across multicloud environments—enabling better DevSecOps. Intel and VMware are also working together to optimize and accelerate the microservices middleware and infrastructure with software—including eBPF—with a focus on improving performance, crypto accelerations, and security for building distributed workloads.

Strengths: Leveraging open-source Istio, Tanzu Service Mesh provides robust enterprise services—including autoscaling—across multiple K8s clusters, offering operational simplification and automation with advanced resiliency and security functions. In addition to supporting various application platforms, public clouds, and runtime environments, Tanzu Service Mesh supports federation across multiple clusters for end-to-end connectivity, resiliency, and security. VMware offers a per-Kubernetes node core subscription model with production support included.

Challenges: While able to run fully independent of VMware’s technology stack, Tanzu Service Mesh offers increased value for VMware’s installed base rather than a broader audience. TSM currently lacks end-to-end security capabilities, including the extensibility of TSM to both third-party and VMware endpoint detection and response (EDR) and mobile device management (MDM) solutions, including VMware Carbon Black. TSM includes multiple overlapping gateway and load-balancing technologies, which presents configuration challenges.

6. Analyst’s Take

Despite emerging only in 2016, the service mesh landscape is turning into a battleground as open-source projects and commercial vendors strive to address the complexity of microservices-based cloud-native deployments. Focused on making their service meshes faster, more efficient, and easier to manage, suppliers are increasing the pace of innovation to meet customer demands.

The past year has seen many innovations, including the emergence of Cilium Service Mesh, which leverages eBPF to increase performance and reduce resource contention, and Istio Ambient Mesh, a new, layered Istio data plane architecture offering simplified operations, broader application compatibility, and reduced costs. An alternative to Envoy sidecars, Istio Ambient Mesh splits Istio’s functionality into a secure Layer 4 overlay layer and a Layer 7 processing layer, each offering relevant telemetry, traffic management, and zero-trust security capabilities.

Several suppliers have either incorporated eBPF technology in their platforms or included it on their roadmaps, while Solo.io—a co-creator with Google—already supports Istio Ambient Mesh in Gloo Mesh 2.1. Greymatter.io continues to push the boundaries with a heuristics-based AI health subsystem and out-of-the-box GitOps IaC capabilities. Several other suppliers have also significantly enhanced their platforms.

Leading cloud vendors are also rolling out service mesh capabilities embedded within their portfolios to provide consistent network traffic controls, observability, and security. These include Amazon’s AWS AppMesh and Google’s Anthos Service Mesh. While this diversification creates a new battleground, efforts have been made to standardize interfaces and unify various workloads deployed across different service meshes. For example, SMI—an open, K8s-native specification project launched by HashiCorp, Kinvolk, Linkerd, Microsoft, Solo.io, and Weaveworks—comprises a set of standard, portable APIs providing developers with interoperability across different service mesh technologies. Furthermore, the significant overlap between the SMI APIs and the K8s Gateway API is driving integration.

However, a CNCF microsurvey indicates that among the most significant challenges enterprises face are a shortage of engineering expertise and experience, architectural and technical complexity, and the choice between open-source projects and commercial products. In addition, the survey indicated that ultra-fast, ultra-light, and easy to deploy and manage service meshes—such as Linkerd, Kuma, and Traefik Mesh—are at the top of the shortlist when it comes to addressing the security, observability, reliability, and traffic management concerns of customers.

While Istio is the most widely deployed service mesh today, we expect its market leadership to decline over the next 18 to 36 months as customers migrate to alternatives. Moreover, with Istio coming under the governance of the CNCF, we expect to see significant changes as Istio-based vendors such as Solo.io have a greater say in the direction of the platform. And while purists might claim that Istio-based vendors such as F5, Solo.io, and VMware are not service mesh “providers” in the true sense of the word, we believe that they have established themselves as inextricable cogs in the wheel and will continue to exert significant influence over not just Istio, but the industry as a whole.

However, while an Envoy or Istio-based service mesh or one with widespread support may be considered the safe choice, that should not be the determining factor. Many use cases can be supported with an easy-to-use, lightweight, and infrastructure-agnostic service mesh incorporating essential functionality and supporting both east/west and north/south traffic. Some newer vendors—such as greymatter.io—offer the same service mesh capabilities but with significant differentiation from an AIOps perspective.

Avoid adopting a service mesh based purely on consumer trends, industry hype, or widespread adoption. Instead, take the time to understand the problem you’re trying to solve. Explore the potential tradeoffs in terms of performance and resource consumption. Evaluate your support requirements against your in-house resources and skills (many open-source service meshes rely on community support). Once you’ve created a short list, choose a service mesh—and microservices-based application development partner—that works best with your software stack.

7. Methodology

For more information about our research process for Key Criteria and Radar reports, please visit our Methodology.

8. About Ivan McPhee

Formerly an enterprise architect and management consultant focused on accelerating time-to-value by implementing emerging technologies and cost optimization strategies, Ivan has over 20 years’ experience working with some of the world’s leading Fortune 500 high-tech companies crafting strategy, positioning, messaging, and premium content. His client list includes 3D Systems, Accenture, Aruba, AWS, Bespin Global, Capgemini, CSC, Citrix, DXC Technology, Fujitsu, HP, HPE, Infosys, Innso, Intel, Intelligent Waves, Kalray, Microsoft, Oracle, Palette Software, Red Hat, Region Authority Corp, SafetyCulture, SAP, SentinelOne, SUSE, TE Connectivity, and VMware.

An avid researcher with a wide breadth of international expertise and experience, Ivan works closely with technology startups and enterprises across the world to help transform and position great ideas to drive engagement and increase revenue.

9. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

10. Copyright

© Knowingly, Inc. 2023 "GigaOm Radar for Service Mesh" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.