Table of Contents
- Summary
- The Analytics New Normal
- Cloud Adoption and the Cloud Data Warehouse
- Distributed Cloud: Building Blocks
- Wrap Up
- About Andrew Brust
- About GigaOm
- Copyright
1. Summary
Data warehouse platforms have been around for decades, with a long and interesting journey from the appliance-based products that typified their early days to the cloud data warehouse counterparts that are popular today. Data warehouses started out as a repurposing of the relational database technology used for operational applications, but the technology has evolved and has become increasingly optimized for analytics with the introduction of massively parallel processing (MPP), columnar storage, and vector processing. Add in cloud-native technology and the data warehouse platforms of today are almost unrecognizable relative to their forebears.
A “new normal” for analytics is now before us, accelerated by the pandemic and characterized by corporate data spread across numerous data sources—multiple operating systems and various cloud, database, and/or analytics platforms. Data protection regulations, meanwhile, may stipulate that certain data must remain within national boundaries, making the physical centralization of an organization’s data an impossibility.
In the past, the mainstream take has been that physically siloed data was something to “fix,” and that all warehouse data should be centralized for optimal results. However, it is now time to challenge that idea. Organizations should embrace the distributed nature of data and integrate at least some of their data logically rather than physically. The data warehouse will need to evolve to bring analytics to the data instead of the other way around, and yet unify management to create one logical cloud.
When big data technology was initially on the rise, many proponents of its then-current technologies thought that data warehouses were obsolete and would soon disappear. Instead, the cloud acted as a major catalyst of innovation, and data warehouses can now handle the same volumes of data as dedicated big data technologies such as Hadoop and Spark, and they are optimized to meet the needs of today’s modern, large-scale enterprise workloads. The data warehouse has developed into a bona fide big data technology in its own right and has been enjoying a strong comeback for at least the last five years.
The cloud data warehousing romance period has now passed, however, as numerous strong competitors have entered the landscape. The initial hype surrounding the pioneers that were once industry darlings has subsided, and they are now viewed with the same skepticism as the incumbent data warehouse vendors were in recent decades. New trends have emerged in the data warehouse world (see Figure 1), including multi-cloud and hybrid-cloud operation, as well as compatibility with container technologies Docker and Kubernetes. Edge computing is also on the rise, although data warehousing in that setting remains a ways off.
Figure 1. Distributed Cloud Building Blocks
To best position themselves for success given these new trends, organizations will need an approach that accommodates a distributed cloud infrastructure, as well as a unified control plane to manage it, despite its physically distributed nature. Such an approach will need to find an equilibrium between the best of the established technologies and the most innovative of the new. Allies in an organization’s distributed cloud journey include data virtualization, containers, orchestration, and a combination of optimized hardware on-premises and commodity infrastructure in the cloud.
Analytics, data warehousing, and the cloud have changed—and changed each other—in permanent and powerful ways. This report is designed to help you understand these changes, how they are driving the future direction of the industry, and most importantly, the steps organizations should take around cloud data warehousing now in order to position themselves most advantageously for success down the line.