Google Anthos Bare Metal vs. Do-It-Yourself Deploymentsv1.0

1. Executive Summary

The transformative impact of the cloud on businesses has prompted a broad and rapid migration to cloud-first strategies. Yet as organizations transition to cloud-aligned infrastructures, they are challenged by older, monolithic applications with large data volume and processing capacity needs.

These applications are not an easy or natural fit for the cloud, for a myriad of reasons that include application size, complexity, regulatory and compliance requirements, and cost.

While the rest of the application landscape increases its improvement velocity, these laggards can become a technical and business liability. How do architects make sure the technical debt and friction don’t grow to become a threat to the business?

To get closer to a solution, we need to understand the reasons these applications stay on-premises, and consider the steps we can take to derive benefits from cloud-based models without adding undue overhead or operational burden. In short, what are the options for modernizing apps on-premises—with all the benefits of modern cloud-native technologies like containers and Kubernetes—and what are the implications of different on-premises modernization models?

In this report, we take a look at some of the options at our disposal, and explore the capabilities and benefits of each on-premises app modernization approach. To this end, we ran a field test that compares the process of deploying a sample application on the Google Anthos Bare Metal platform to the process required to deploy a similar application on a do-it-yourself (DIY) infrastructure. We learned that:

  • Kubernetes is a core enabler but should not be your core business. It is a means to an end, helping organizations achieve their strategic goals faster, cheaper, and better. The business value of Kubernetes isn’t Kubernetes itself but the technological potential it unlocks. It has massive potential to deliver business value, but the many difficult, time-consuming tasks required to operationalize Kubernetes at scale distracts engineers from realizing that business value.
  • Self-managed Kubernetes is more expensive. Organizations seek to maximize the value they get out of Kubernetes while minimizing their investments; DIY platforms do not align with these values, requiring expensive, specialized knowledge while the toil of day-to-day operations distracts the team from meaningful work. A managed, cloud-connected Kubernetes platform lowers onboarding and operational costs. Streamlined operations, simplified administration, and proximity to advanced cloud services are additional benefits.

This GigaOm Field Test report enables us to assess different approaches to run these applications on-premises and weigh the pros and cons of each. We determined that using the Google Anthos Bare Metal fully managed solution provides a measurable TCO advantage over DIY environments, and that Anthos Bare Metal offers additional benefits beyond TCO. Among the cost benefits we found:

  • 60% lower server and storage costs
  • 81% reduction in labor costs over 3-year period
  • 92% reduction in Help Desk Level 1-3 support costs

Overall, we learned that a managed platform frees up engineers’ time and resources to work on application modernization, instead of just keeping the infrastructure running. As a result, we can say that Anthos Bare Metal may be viewed as an imperative element of a forward-thinking application modernization strategy.

2. Drivers of On-Premises Cloud

The cloud is here to stay, and it has become a first-class citizen for the deployment of new and existing applications. The simplicity and ease-of-use of using a service, instead of building the infrastructure yourself, is a huge advantage, saving both time and money. Many organizations are now cloud-first, leveraging the cloud’s advantages wherever they can.

Cloud-native approaches are becoming increasingly popular, too; encompassing more and more use cases from refactoring existing applications to creating container-based microservices architectures, as well as being able to cost-effectively run legacy applications in the cloud that previously would not have been economically viable.

This broadening of use cases means that organizations are not only cloud-first but that the majority of their application landscapes run in the cloud, with fewer and fewer exceptions that cannot be moved to cloud. No large organization operates with only modern, cloud-based applications. They need to work with, through, and around existing systems and applications, some of which cannot be migrated to the cloud immediately—or ever. These applications are often the mission-critical systems at the heart of an organization’s commercial operations. There’s no simple lift-and-shift strategy that meets the needs of every app.

This application inertia is often due to a number of reasons, including:

Specific Compliance, Data Sovereignty, and Data Security Requirements
In many cases, data sovereignty, data security, or similar restrictions due to law, security policies, or compliance rules prohibit an application from running in the public cloud. This proscription often extends to personally identifiable information (PII), medical, or other sensitive data. Often, it’s because of historical context, instead of actual technical reasons, that these rules have yet to be adapted to cloud-computing standards. In some cases, a competitive or commercial reason to not leverage cloud is a decisive factor in not making the move.

Monolithic or Inefficient Application Design
Some application architectures simply don’t align with the pricing model of cloud computing, and the cost and timelines of refactoring prevent organizations to do exactly this—at least in the short term. That means that some applications, while in the queue for being refactored or replaced, must stay on-premises for the time being. This includes cost items like egress networking or excessive storage costs for some legacy application architectures.

Demand for very low Networking Latency
On the technical side, some highly transactional, latency-sensitive systems like those in banking or transportation take an unacceptable performance hit if they are too far away from their users, their data, or the next-hop data processor in the application’s flow.

High Residual Economic Value remaining in legacy infrastructure
Sometimes it just comes down to budgets and existing investments in data center equipment, like services, networking gear, or storage devices. In these cases, it doesn’t make sense, yet, to migrate from CapEx to OpEx, and the application is economically best served on the existing on-premises infrastructure. That doesn’t mean there are no gains to be made here; often a switch from bespoke, virtualization-based architectures to more standardized, container-based architectures helps reduce cost.

However, these on-premises workloads don’t need to be left to remain in the self-managed, self-hosted, do-it-yourself environments we’re accustomed to in our own data centers, nor should these applications hold organizations back in their efforts to modernize, go to market faster, be more competitive, or reduce operating costs.

The choice often is to continue running them on legacy, virtualized infrastructure, or to stand up a DIY cloud-native (containers, Kubernetes) environment and migrate the application there for modernization. This is the arena of on-premises cloud solutions like Google’s Anthos Bare Metal: extending the cloud’s capabilities and services to your on-premises data center so that the application can leverage many of the advantages of cloud computing while creating a consistency of operations across locations.

Anthos Bare Metal is especially interesting to organizations already using Google Cloud Platform and looking for a solution to seamlessly integrate their remaining on-premises applications into their cloud operations framework. Anthos enables organizations to derive the benefits of cloud-based architectures while significantly reducing the costs associated with a DIY approach, across operations, lifecycle management, monitoring, and visibility—all traditionally hard areas of IT operations.

3. Options and Benefits for On-Premises Cloud

Let’s dive into the different options available to an organization evaluating the best course of action for on-premises applications: do nothing and continue running as a VM-based application, create a bespoke, DIY Kubernetes platform, or use a cloud-connected, packaged platform like Google Anthos Bare Metal.

Figure 1: Options for On-Premises Applications

Do Nothing
The simplest course of action, with the largest potential negative impact, is doing nothing.

In this report, we will not further explore this option; even though for some applications already on the list to being sunset it may be a valid option. Doing nothing is a short-term solution for only a small subset of applications; for all others, the technical friction and future migration cost will be prohibitive.

Build and Manage a DIY Kubernetes Platform
Building your own Kubernetes-based platform will bring many of the cloud-native advantages to the on-premises world.

However, building a platform yourself, from scratch, is a daunting task. The cognitive load means that a specialized team must build and manage the platform, which requires hard-to-find expertise (especially in the current market) engaged in constant upkeep. The upsides over going with a managed, commodity-off-the-shelf solution are marginal, while the toil of the day-to-day operations distracts the team from more meaningful work.

For most enterprises, Kubernetes is not the core business. It’s a core enabler, helping organizations achieve their strategic goals faster, cheaper, and better. As such, organizations seek to maximize the value they get out of Kubernetes while minimizing their investments: the business value of Kubernetes isn’t Kubernetes itself, but what it unlocks in terms of technological potential. It has massive potential to deliver business value, but the many difficult, time-consuming tasks of operationalizing Kubernetes at scale distracts engineers from realizing that business value.

Building a platform from scratch, using both open-source components and commercially supported products is expensive, time-consuming, and difficult. A DIY platform needs so much more than just Kubernetes, including identity and access management, security tooling, monitoring, service-mesh functionality, a serverless runtime, and much more across a scattered, fragmented landscape of tools. Even in scenarios where all these requirements are addressed, the inherent complexity makes it likely they will fall short in capability.

If organizations are cloud-first, they will already be using many of these hard-to-implement tools—as managed services in the cloud. And many of these quick-and-easy cloud options are not (readily) available for DIY platforms, requiring duplication of effort, additional operational overhead, and inconsistency of management.

Finally, a DIY platform is hardly integrated into the cloud estate, both from a networking and a management perspective. Clusters have to be separately managed from a different management portal, and networking is just completely separate from the cloud environment, making it much harder to have deep networking connectivity between applications. This not only incurs additional operational overhead, it also creates a disconnect, making it much harder to migrate applications to the cloud once the time is right. None of this aligns with the cloud-first approach that many organizations are taking, creating additional friction and work in migrating the app in the future.

Leverage an On-Premises Cloud Platform
Conversely, an on-premises cloud platform extends the cloud’s capabilities and services to on-premises data centers and any other environment (including other clouds and edge). This transforms the on-premises data center into a true first-class citizen: a fully integrated and cloud-managed development and application modernization platform.

The main advantage is that solutions, like Anthos Bare Metal, create a consistency of operations across environments in a cloud-first fashion, so that all the cloud’s advantages stretch to those applications that must (for whatever reason) stay on-premises.

That means that on-premises clusters are fully managed as part of the cloud’s managed services, making it easy to roll out configuration and security policies across cloud clusters and on-premises clusters alike, perform lifecycle management like updating and upgrading multiple clusters from a single console, apply networking configurations and service mesh policies uniformly, and monitor on-premises infrastructure and applications as first-class citizens of the cloud’s monitoring environment and dashboards.

The deep networking (and service mesh) integration means on-premises clusters and the applications running on top of them are now fully integrated into cloud clusters, allowing for a simpler networking architecture, unified method for connecting applications, and better application connectivity. It also enables better, more consistent, and granular security policies and better out-of-the-box observability. These features are essential to maintain an organization’s security posture from the start, as the platform is adopted across more development teams.

While there are numerous platforms out there, most are restricted to either that vendor’s cloud services, or require specialized hardware to set up—both of which severely limit the cost-effectiveness and applicability of the solutions. Multi-cloud capabilities are core to solving the challenges discussed above, especially in the realms of regulatory compliance, data sovereignty, and networking latency. As a result, applications must be able to run anywhere: on bare metal in the data center, on top of a virtualization platform, in the cloud, and at the edge.

These solutions are, in part, a packaging of open-source technologies like Kubernetes, Istio, OpenTelemetry, OIDC, and KNative, but it’s the validation, support, streamlining of the installation, and the fully managed lifecycle that brings value. Companies seek robust, hardened solutions based on common open-source tools from trusted vendors, and these platforms are here to deliver a cloud-native developer experience that stretches beyond just running on-premises applications. They are able to modernize existing applications and develop new cloud-native applications on the same platform. Additionally, these solutions allow the deployment of applications from the respective cloud’s marketplace for easy access to, and deployment of, many different applications, with billing going through the cloud vendor’s marketplace.

This way, the underlying infrastructure is abstracted away completely, and software development teams can create, deploy, and run applications with a single, consistent workflow, using the same pipelines and tools anywhere. In turn, this level of consistency enables true application portability and migration, so organizations can move their applications between environments without friction, with the freedom of choice between container-based and serverless-based architectures.

The ability to consistently create and apply security policies across clusters helps to maintain security posture and compliance, without adding to the workload of the security team. With the rich security tools in cloud services, it is easy to forget how cumbersome security could be in the bespoke on-premises world. These platforms help security teams reassert control over on-premises data center environments, allowing them to apply the same policies used in the cloud to the on-premises world. Security teams can focus on building policies as code and applying them at scale, instead of continuously performing manual checks of each individual cluster or deployment.

4. Google Anthos Bare Metal: an On-Premises Platform

The goal of this testing is to determine how well Google’s Anthos Bare Metal meets the needs outlined above. Anthos Bare Metal is a cloud-native application modernization platform, helping organizations move applications from legacy environments to container-based microservices architectures. Anthos is based on the same hardened, open-source technologies that underpin Google Cloud. It includes Google Kubernetes Engine, Google’s enterprise container runtime, centralized configuration, and security policy management and enforcement, It also offers a service mesh for observability, security, and control, a multi-cluster management console, a serverless runtime, and migration services to onboard legacy apps into containers.

In summary, as shown in Table 1, Anthos represents the option with the quickest time-to-value and least engineering effort while modernizing legacy applications in an on-premises environment. Companies get the benefit of having consistency of management, and automated lifecycle benefits of a managed service, on their own hardware and in their own data center.

Table 1. Benefits and Challenges

Benefits Challenges
Do Nothing Zero effort. Negative ROI, requires future technical debt, creates friction.
DIY Platform Maximum flexibility. Maximum effort, requires expertise, very complex to create and maintain. Zero cloud benefits.
Google Anthos Bare Metal Simple and rapid, works on-prem and in the cloud, security built-in. Fully managed lifecycle, deep network integration for seamless connectivity with cloud-native apps, fully integrated with Google Cloud.
Source: GigaOm 2022

Anthos Bare Metal is a philosophically different approach than DIY as a way to achieve efficient hosting environments. The Anthos approach brings the cloud to your data center and aligns your on-premises container platform with Google Cloud’s container platforms, whereas DIY approaches result in an infrastructure lacking this alignment. In short, Anthos’ cloud-first approach extends the benefits of the cloud to the on-premises world (instead of vice-versa). The goal of Anthos Bare Metal is to turn the on-premises data center into another cloud region.

This approach is most evident in the self-service, on-demand aspects of Anthos Bare Metal. It’s easy and quick to install, making onboarding a fairly trivial task. The included security, monitoring, and migration features—which are fairly unique to on-prem environments to have available out of the box—show that it’s an enterprise-ready platform aimed at maximizing Kubernetes’ potential as a modernization platform.

Comparing DIY and Google Anthos Bare Metal

In this benchmark, we’re comparing the DIY option with Google Anthos Bare Metal. The “do-nothing” option will not make any strides towards application modernization, and likely will make the situation worse; therefore it is not being addressed here.

In this testing, we’re using Ubuntu 20.04.3 hosts on Equinix, a provider of bare-metal servers. We sized these machines according to the recommended hardware specifications in the Configuring hardware for Anthos clusters on bare metal section of the documentation for all testing. One of these hosts serves as the admin and cluster management workstation, providing an access point into the Kubernetes environment. The remaining hosts are used to create the cluster, running both the control plane and the worker nodes. Figure 2 provides a side-by-side overview of the DIY and Anthos architectures.

Figure 2: Anthos and DIY Architectures

Setting up the Anthos Bare Metal Environment

The initial setup of the admin workstation includes setting up gcloud and bmctl, two command-line utilities needed to create and operate the Anthos Bare Metal cluster. bmctl is a tool that creates, manages, modifies, deletes, and updates Anthos Bare Metal clusters.

Cluster setup is done by editing the supplied default, cluster-configuration yaml file. Our customization of that file is limited to editing the IP address network range to reflect our networking configuration and configuring an local network address pool to host the cluster load balancer and future service load balancers.

As part of our preparations, we opened up relevant networking ports in the Equinix environment as seen in the Networking Requirements section of the Anthos documentation.

By issuing a bmctl “create cluster,” a pre-flight check is executed on each of the nodes, as well as checking on connectivity to and authenticating to Google Cloud Platform. Once all tests pass, the cluster is created. This cluster connects back to Google Cloud Platform for centralized management and integration with centrally managed security policies, identity and access management (IAM), Google Kubernetes Engine, and other GCP components.

In our testing, we created hybrid clusters that run user workloads and manage other clusters and themselves. These clusters are the recommended option in resource-constrained scenarios but have some trade-offs in terms of flexibility and security (as the cluster manages itself and other clusters, there’s a bigger attack surface to things like SSH keys).

These clusters became visible in GKE’s management interface after installing, giving us the ability to manage these on-premises bare metal clusters as just another cluster from the same interface as other cloud-based GKE clusters, allowing us to apply the same policies to the on-premises cluster seamlessly. With the out-of-the-box monitoring and logging, sending metrics and logs from the on-premises cluster to GCP cut installation and maintenance effort to almost zero.

Setting up the Do-it-Yourself Environment

The hardware setup and operating system for our DIY scenario are identical to the Anthos scenario. However, we had to install Kubernetes, Istio, and KNative ourselves and manually create clusters. We used Rancher as a cluster management solution.

For the DIY build, we had to prepare the nodes for Kubernetes manually, requiring more specific and specialized knowledge to tweak the underlying (Ubuntu-based) operating system, such as disabling swap and installing common Kubernetes tools (kubelet, kubeadm, and Kubernetes-cni). Perhaps more daunting was the selection and installation of a pod networking solution. We chose Weave Net for its simplicity.

An additional host was used to install Rancher and import our DIY Kubernetes cluster. However, the build process wasn’t a simple “from A to B” process; instead it required us to rebuild the cluster multiple times due to issues and problems. We often hear similar sentiments from the field—building your own platform isn’t straightforward, takes more time, and requires more experience than estimated in advance.

Due to the time spent setting up and configuring the DIY platform, we lost valuable time and couldn’t maximize our time usage to more pertinent testing. This is characteristic of DIY deployments—spending more time on the non-value-add work rather than on higher-value work.

One of the areas impacted most by this dynamic is networking—specifically, external access to the internet or other networks. With managed, hosted solutions, networking and compute is highly integrated. With DIY platforms, the integration has to be done manually as a one-off, bespoke project. Likely, the different products that need to be integrated weren’t specifically chosen for this integration, which can produce significant hardship for management teams.

We saw this also in our testing. We selected Metal LB, an ingress load balancer for providing the same level of ease of use and functionality as common cloud-based options; however, installing and configuring it proved much rougher around the edges than expected. With Equinix Metal hosting our on-premises nodes, we needed to enable BGP and deploy its Cloud Configuration Manager to our Kubernetes cluster. Unfortunately, this deployment failed, and we were unable to successfully deploy and test the solution within a reasonable amount of engineering time. Instead, we used simple node port configurations to continue with testing. Readers should take note of the difficulties involved in setting up ingress load balancing to an on-premises cluster.

5. Field Test Findings and TCO Analysis

This GigaOm Field Test Report compares and contrasts the two popular options for on-premises application modernization platforms: DIY and Google Anthos Bare Metal. Our top-level findings across features, functionality, operational efficiency, administrative overhead, time-to-value, and TCO are described in this section.

In our hands-on assessment of Google Anthos and DIY approaches, we stood up and configured an on-premises cloud on each platform to assess relative impacts. We scored each platform on a scale of 1 to 3 across four specific areas—node management, cluster management, workload management, and security–where 3 is judged to be outstanding and 1 is poor. (see Table 2).

We found that Anthos boasted superlative functionality across all four areas, while DIY functionality in cluster and workload management was deemed adequate, and in node management and security was judged to be poor.

Table 2. Capabilities Between Anthos and DIY Compared

Anthos DIY
Node Management 3 1
Cluster Management 3 2
Workload Management 3 2
Security 3 1
Aggregate Score 3 1.5
Source: GigaOm 2022

  • Node Management refers to capabilities around management of the pool of worker nodes, including lifecycle management of the operating system and Kubernetes deployment, but also dependencies and prerequisites like networking and storage, day-to-day operations, and maintaining node health and availability.
  • Cluster Management capabilities include creating, importing, and managing Kubernetes clusters, cluster health reporting, and cluster lifecycle activities, such as upgrading cluster versions. It also refers to features to auto-scale cluster nodes from a pool.
  • Workload Management refers to capabilities for management and manipulation of workloads and applications and associated Kubernetes objects (config maps, secrets, etc.), including the ability to create statefulsets and cronjobs, or deployments from an integrated marketplace.
  • Security capabilities include features to integrate with OIDC, ability to define and assign role-based access control (RBAC), support for external key management systems, support for running containers as a non-root user, AppArmor security policies, audit logging, and finally standards compliance like PCI DSS, NIST Baseline High, and DoD Cloud Computing SRG Impact Level 2.

Field Test Findings

Our examination of Google Anthos and DIY approaches to cloud-enabled, on-premises infrastructures reveals a number of important observations. To help assess the relative complexity and difficulty of setting up and enabling capabilities across Anthos and DIY infrastructures, we established a sizing chart that does two things: It lists out for each approach the specific chain of tasks required to complete a specific activity, and also depicts the relative difficulty, cognitive load, and time required to complete each task in the chain. For this latter metric, we employ t-shirt sizes ranging from extra-small to extra-extra-large to present the relative workload imposed by each task, as shown in Figure 3.

Figure 3. Sampling of Task-Based Workloads Across Anthos and DIY Deployments

The base functionality for both platforms is identical, sharing commodity features for running applications. In both cases, we were able to run container-based applications without much difference in performance or functionality, scaling them up or down based on user traffic or application resilience. However, the amount of engineering effort and expertise required to get the platforms up and running varied wildly.

The DIY platform took much longer to set up, required expensive, hard-to-get expertise, and demanded constant engineering time on what can best be described as toil—non-differentiating tasks and boilerplate infrastructure. Our research engineers had to battle technical issue after technical issue while setting up the DIY platform, which is reflected in the long and dense task chain depicted for DIY deployments in Figure 3. The build process required the engineer to rebuild the cluster multiple times due to issues and problems, and eventually settle for less-than-ideal technical solutions and implementation to work around them.

Additionally, finding the right experience to deal with the complexity of the many different software projects or products involved in a DIY platform was nigh impossible, especially on the networking and service mesh side. We expect that our experience here translates into the real world in terms of time to initial value and cost-effectiveness, not to mention the decreased developer productivity.

The process to set up Anthos Bare Metal was simpler and required only a few binaries to be installed on an admin’s workstation to deploy a fully featured, fully configured cluster to a set of hardware. The pre-flight checks in the Anthos Bare Metal deployment process proved invaluable, while the hands-off, fully automated deployment meant anyone could deploy Anthos, not just the senior engineers with years of experience deploying Kubernetes.

Anthos’ streamlined and automated operational model is far superior to DIY, with significantly lower operational overhead when considering the entire operator experience. The consistency of operations across cloud and on-premises clusters reduced the need for additional training to zero, allowing admins to manage the on-premises infrastructure from a single, centralized management system. The self-healing and auto-scaling features of Kubernetes further reduced day-to-day operational overhead.

Anthos’ integration with the Google Cloud management console means newly deployed clusters are consistently manageable alongside existing cloud clusters using the same cloud management tools—no additional work needed. For the DIY platform, we had to set up Rancher to get some of Google’s functionality, requiring additional licensing cost, expertise, and engineering time; but even then the on-premises clusters did not benefit from centralized policies, and we had to configure security policies separately. With Anthos, existing policies in GCP were applied automatically and consistently across Anthos’ on-premises and cloud clusters.

There are significant differences between DIY management tools and Google’s GKE Console. Like Google’s management console, Rancher offers a user interface to manage workloads and maintain the overall health of clusters. However, it is limited in usefulness for scaling Kubenernetes in the enterprise, as it is missing Anthos features like autoscaling clusters and node pools in our DIY solution. Also absent is Anthos’ broad support for Windows-based nodes. Both of these features help to scale the number of nodes in a cluster and scale the number of different cluster types supported under management, creating a better consistency of management.

Anthos’ baked-in monitoring proved invaluable compared to DIY. The out-of-the-box integration with existing monitoring services in Google Cloud meant that on-premises clusters and applications automatically sent logs and metrics to the monitoring server, without any configuration or implementation of additional monitoring software—something we did have to do with the DIY option. Not only did this save us valuable time in setting up the cluster; the consistency of operations with a single dashboard across on-premises and cloud applications meant operations teams can create a better context as they look into potential issues, solving them more quickly.

Security is a first-class citizen in Google Anthos, preventing on-premises clusters from being the security exception. Integration with identity and access management solutions is a notoriously difficult aspect of DIY platforms, one that we’ve simplified by leveraging a commercially available solution from Rancher. Anthos’ integrated security model was much easier to get up and running as compared to the DIY platform, where we needed to separately install and configure Anthos. The role-based access control in Anthos was more granular, allowing admins to set access permissions across cloud and on-premises clusters alike.

Anthos clusters run on top of customer provided OSes that are Google validated, allowing for existing processes to deploy hardware and the enterprise-approved Linux distribution (that is itself also Gooogle validated) to reduce the stress and complexity of adding Anthos to an existing data center. Anthos allows enterprises to meet PCI DSS, NIST Baseline High, and DoD Cloud Computing SRG Impact Level 2 standards by leveraging Google’s knowledge and continuous learning. Anthos also leverages Docker AppArmor security policies for additional runtime security, protecting against various exploitation opportunities. Audit logging is built in for all events that occur on Anthos, giving administrators access to a wealth of data for forensic analysis or alerting needs.

Google Anthos drastically reduces lifecycle management requirements. The bmctl command reduces the knowledge barrier it takes to create and modify the cluster. Upgrading is a matter of downloading a new version of the bmctl tool, updating the cluster version in its configuration file, and executing the upgrade command. The rolling upgrade process automatically drained hosts, upgraded Kubernetes, and moved on to the next host in the cluster, causing no downtime for resilient applications. Compared to DIY, the operation and upgrading of clusters was a breeze, saving many engineering hours and reducing the need for highly skilled Kubernetes experts. Lifecycle management is non-existent in the DIY platform, requiring engineers to define, automate, and follow processes.

Google Anthos Bare Metal offers savings up to 81% compared to the DIY solution over a three-year span, almost exclusively due to lower staff costs. These savings are realized because certain specialized and highly expensive expertise is no longer needed, and because deployment and day-to-day operations are streamlined and automated, reducing the need for dedicated staff.

Anthos Bare Metal offers savings of up to 60% for server and storage costs when compared to running virtual machine-based applications on the same hardware, due to increased compute density with container-based applications. It further removes the need for expensive virtualization software (and upgrades), as Anthos Bare Metal can run directly on the hardware and fully supports older versions of virtualization platforms like VMware vSphere and OpenStack.

Anthos’ migration capabilities contribute to an unmatched developer experience compared to the DIY option. Application developers enjoy a standard, documented, and supported migration capability that can quickly and successfully modernize applications without expensive rewrites of application code. It also supports managing Amazon EKS, Azure AKS, OpenShift, and Rancher clusters, providing a true multi-cloud migration experience.

Building new cloud-native applications and components results in a faster time-to-market for new features and increased customer revenue by increasing release frequency. With Anthos Bare Metal, every application is adjacent to Google’s cloud services, including advanced options for machine learning and artificial intelligence to further future-proof applications.

TCO Analysis

To get a handle on the cost outlook for a DIY vs Anthos deployment, we developed a table that sums up the relative cost and time on task for the different roles typically engaged in a Kubernetes deployment. The figures are based on a Fortune 100 company deploying 100 hosts. Table 3 in the Appendix provides a rundown of the roles and hourly labor rates.

Based on those hourly estimated labor rates, Table 4 shows the labor costs associated with each engaged role. These costs are broken into labor required for initial setup of an on-premises cloud architecture and into labor costs for a three-year operating cycle. The table compares the setup and operation costs of a DIY solution and a Google Anthos solution. We use the estimated costs in the two tables in the Appendix to inform the three charts that follow.

Figure 3. Anthos vs. DIY Initial Setup Labor Costs

Figure 4. Anthos vs. DIY 3-Year Operating Labor Costs

Figure 5. Anthos vs. DIY Setup and 3-Year Operating Labor Costs

The steep advantage Anthos enjoys in operating and labor costs compared to the competition is only slightly offset by the higher licensing cost—licensing being only a fraction of the total cost of ownership. However, DIY solutions lack the ability to offer a one-stop shop for service, support, integration, and especially security.

6. Conclusion

Organizations that need to keep some applications on-premises can turn to Anthos Bare Metal as a platform to transform the on-premises data center into a fully managed, fully integrated cloud region, leaving behind the days of bespoke, do-it-yourself platforms. With Anthos, organizations can reap the full benefits of cloud computing, even for their on-premises applications. Anthos’ true power is not only that it is a ready-to-go, on-premises platform saving on setup and operating costs, it also extends the power of Google Kubernetes Engine to bare metal and virtualized on-prem environments, as well as other public clouds.

As seen in this report’s field test, implementing Anthos Bare Metal is straightforward, saves on expensive dedicated expertise, frees up engineering capacity to work on business-related projects, and offers additional significant benefits over a DIY platform. These additional benefits include consistency of management, improved security features, out-of-the-box observability, and more.

Finally, we recognize that reality is unevenly distributed and that on-premises solutions will be with us for a long time. By designating on-premises data centers as yet another computing location and using Anthos Bare Metal, we can reap the benefits of cloud computing without sidestepping the challenges that keep applications on-premises.

7. Appendix

Table 2 provides an overview of the relative cost for the different roles typically engaged in an enterprise Kubernetes deployment. The figures are based on a Fortune 100 company deploying 100 hosts. The table provides a rundown of the roles and hourly labor rates. The figures here are used to inform Table 3 and Table 4 just below.

Table 3. Hourly Labor Costs by Role

Role DIY Cost/Hr. Anthos Cost/Hr. Description
ITSM Engineer $100 $100 Handles ITIL processes like incident, CMDB, and problem staffing.
Product Manager $120 $120 Agile product manager responsible for time and budgets.
Helpdesk lvl 1 $60 $60 1st level of IT helpdesk on incidents and trouble tickets.
Helpdesk lvl 2 $80 $80 2nd level of helpdesk running known resolutions.
Helpdesk lvl 3 $120 $0 3rd level of helpdesk for escalation of unknown outages requiring vendor engagement.
K8s Engineer $140 $100 Kubernets administrator with run-time experience.
K8s Architect $150 $120 Kubernetes architect with deep configuration knowledge.
Network Engineer $130 $130 Layer 2 network engineer for configuration changes.
Network Architect $160 $160 Layer 3-7 network architect with deep design skills.
Storage Engineer $120 $120 Operational administrator responding to ITSM tickets.
Storage Architect $150 $150 Strategic storage architect responsible for 3-5 year designs.
Security Engineer $120 $80 Security staff responding to ITSM tickets or SIEM events.
Security Architect $140 $140 Security architect responsible for risk mitigation of cyber threats.
Automation Engineer $120 $80 Automation admin who creates, executes, and troubleshoots automation or orchestration tasks.
Optimization Engineer $120 $120 Performance admin responsible for SLA/SLO compliance.
FinOps Engineer $100 $100 Financial reporting expert responsible for budget adherence and planning.
Source: GigaOm 2022

Table 4. Initial Setup Labor Costs Compared: DIY and Google Anthos Solution

Role DIY Cost Anthos Cost
ITSM Engineer $8,000 $1,600
Product Manager $4,800 $2,400
Helpdesk lvl 1 $0 $0
Helpdesk lvl 2 $0 $0
Helpdesk lvl 3 $0 $0
K8s Engineer $22,400 $4,000
K8s Architect $24,000 $2,400
Network Engineer $10,400 $2,600
Network Architect $6,400 $1,280
Storage Engineer $9,600 $1,920
Storage Architect $6,000 $1,200
Security Engineer $14,400 $1,600
Security Architect $5,600 $1,120
Automation Engineer $18,000 $3,200
Optimization Engineer $0 $0
FinOps Engineer $0 $0
TOTAL $129,600 $23,320
Source: GigaOm 2022

Table 5. Three-Year Operating Labor Costs Compared: DIY and Google Anthos Solution

Role DIY Cost Anthos Cost
ITSM Engineer $23,600 $17,200
Product Manager $79,680 $21,120
Helpdesk lvl 1 $93,600 $18,720
Helpdesk lvl 2 $249,600 $24,960
Helpdesk lvl 3 $172,800 $0
K8s Engineer $224,000 $32,800
K8s Architect $67,200 $6,720
Network Engineer $16,640 $8,840
Network Architect $29,440 $5,120
Storage Engineer $65,760 $20,640
Storage Architect $13,200 $4,800
Security Engineer $164,160 $26,560
Security Architect $25,760 $21,280
Automation Engineer $52,560 $10,880
Optimization Engineer $74,880 $34,560
FinOps Engineer $124,800 $28,800
TOTAL $1,477,680 $283,000
Source: GigaOm 2022

8. About Joep Piscaer

Joep is a technologist with team building and tech marketing skills. His background as a CTO, cloud architect, infrastructure engineer and DevOps culture coach. He has built many engineering and architect teams and culture.

Founder of TLA Tech, a tech marketing firm focusing on cloud-native. Co-hosts TheCUBE sometimes. Blogs at

9. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

10. Copyright

© Knowingly, Inc. 2022 "Google Anthos Bare Metal vs. Do-It-Yourself Deployments" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact