Table of Contents
In today’s world, data proliferates, and the individual teams within organizations responsible for analyzing that data usually possess both autonomy and their own preferred tools. This situation can lead to inconsistent definitions of the metrics used in data analysis from tool to tool and from team to team. Semantic layers and metrics stores offer a solution to these pain points, enabling consistent definitions of metrics to be created and used organization-wide.
A semantic layer creates a consolidated representation of an organization’s data, one that makes data understandable in common business terms. A metrics store is a subcomponent of a semantic layer, and it functions primarily as a repository for the definitions of metrics used by an organization in its analytics and reporting. Semantic layers and metrics stores are beneficial for helping business users understand and access data in terms that are familiar to them and useful for workloads that require a higher-level, consolidated perspective on the data or that require data to be understood through common business terms.
Additionally, because semantic layers are essentially abstraction layers over the physical data, they make organizations more flexible and resilient when it comes to change. Even if the data underneath the semantic layer changes, the higher-level concepts and terms this data is associated with in the semantic layer change less frequently, shielding business users from disruption to the way they query their data.
So, what is a semantic layer and how does it work? As Figure 1 shows, a semantic layer functions as a translation layer between the physical data in analytics repositories (including data warehouses, lakes, lakehouses, and other systems) and client applications, such as business intelligence (BI) tools used to query, analyze, visualize, and present the data. Data is unified from across an organization’s disparate data sources, and the semantic layer creates a consolidated conceptual view of the organization’s data estate.
Figure 1. Semantic Layer Diagram
Measures (such as sales, number of page views, or dwell time at a cell tower), dimensions (for example, product category, geographic location, or business segment), and the relationships among these are all defined in the semantic layer. This model is a logical view of the data over the underlying physical storage, which consolidates the data logically, abstracts away its physical location, and effectively translates it into terms and concepts relevant to the business.
The result is what some vendors in this category refer to as a “single source of truth” across an organization. When different teams across the organization build dashboards and reports with their preferred applications and tools, they all work with the same definition of any given business term or concept, because this logic is all stored within the semantic layer. Metrics are defined once and used across the board, eliminating confusion, duplication of work, and inconsistency.
This GigaOm Sonar provides an overview of semantic layer and metrics store vendors and their available offerings, equipping IT decision-makers with the information they need to select the best solution for their business and use case requirements.
About the GigaOm Sonar Report
This GigaOm report focuses on emerging technologies and market segments. It helps organizations of all sizes to understand a new technology, its strengths and its weaknesses, and how it can fit into the overall IT strategy. The report is organized into five sections:
Overview: An overview of the technology, its major benefits, and possible use cases, as well as an exploration of product implementations already available in the market.
Considerations for Adoption: An analysis of the potential risks and benefits of introducing products based on this technology in an enterprise IT scenario. We look at table stakes and key differentiating features, as well as considerations for how to integrate the new product into the existing environment.
GigaOm Sonar Chart: A graphical representation of the market and its most important players, focused on their value proposition and their roadmap for the future.
Vendor Insights: A breakdown of each vendor’s offering in the sector, scored across key characteristics for enterprise adoption.
Near-Term Roadmap: A 12- to 18-month forecast of the future development of the technology, its ecosystem, and major players of this market segment.
The Purpose of a Semantic Layer
Analytics helps derive meaning from raw data, and semantic layers are a vital part of that process because they map business terminology to the raw data that underlies those concepts. Semantic layers also ensure that definitions are consistent across the board. Organizations rarely use only one data storage system for their data or only one application to query and visualize the data. Different teams, such as sales, finance, and IT, often have their departmental data stored on different platforms and have their own preferred tools for retrieving, analyzing, and building reports from that data. Semantic layers and metrics stores provide a unifying framework across this diversity of tools and autonomous teams by delivering consistent definitions usable by them all.
A semantic layer creates a consolidated logical view of an organization’s data, one that is accessible and understandable in common business terms. It “translates” between the underlying data, which can be stored in tables, files, and so forth, and business applications, assigning meaning to it in business terms. It does so by creating a model of the organization’s data, in which the measures (values) and dimensions (and their hierarchies) are defined, allowing higher-level concepts, such as financial ratios or KPIs, to be defined from these in a consistent and controlled way. The measures and dimensions are defined only once and then used across the board; for example, any two reports or dashboards using a metric such as “revenue by quarter” will always be using the same definition of the term.
How it Works
The semantic layer is a logical layer that sits between physical data storage and the client applications that query the data and visualize it in reports or dashboards. A metrics store is a subcomponent of a semantic layer, and it functions primarily as a repository for the definitions of metrics used by an organization in its analytics and reporting.
Without physically moving the data, the semantic layer presents a view from the underlying databases and systems as if it were all from a single source. The semantic model organizes the data in the underlying systems by means of the dimensional schema, including the measures (facts) and dimensions (categories or attributes—and their hierarchies), into a consolidated view that’s organized in business terms. Higher-level concepts such as calculations or KPIs can be defined on top of these building blocks, and that logic is often stored in the semantic layer as well.
When users query data through the applications of their choice, the semantic layer platform acts as an intermediary:
- It receives the user query from the client application (in whatever language the user’s client application of choice is using, typically SQL, but also potentially MDX, DAX, or others).
- It translates the logic of this request and generates the requisite query that will pull the necessary data to satisfy the query from wherever it resides physically.
The semantic layer is optimally placed to perform this function because it is connected to all of the physical data sources and contains all definitions and business logic.
The solution components include the following:
- Semantic model: At the core of the semantic layer platform is the semantic data model, which organizes data according to business terms in order to present the data in semantic terms. Semantic models form the foundation for consistency across the organization and help all users, regardless of technical skill, to reach a common understanding of the data in business terms.
- Query engine: Though not technically part of the semantic layer, the query engine is a critical part of the semantic layer story because it is used by the semantic layer to pull the requested data from the necessary physical locations according to the logic that the semantic layer provides. Today’s semantic layer platforms are able to make use of more powerful query engines that have developed over the decades since the category originated, such as Apache Spark and Hadoop/MapReduce, as well as proprietary ones. This gives modern semantic layer platforms the ability to perform analytics at a much greater scale than what conventional online analytical processing (OLAP) platforms were capable of.
- Metrics store: The metrics store is a subcomponent of the semantic layer and functions primarily as a repository for the definitions of metrics the organization wants to use in its analysis and reporting. All the BI and analysis tools that an organization possesses have access to this repository, meaning all teams are working with the same definitions. Centralizing the definitions of metrics in this way ensures consistency and allows these definitions to be governed; if a change is needed, it must be made only once, and any report or dashboard containing this metric will have access to the updated definition.
- User interface (UI): There are multiple approaches that the platforms in this category take toward providing a UI for building the semantic model. Some vendors provide a no-code or low-code interface, allowing users to drag and drop components onto a visual canvas; in this case, the platform generates the code for the model under the hood for the user. Others provide a code-first interface, allowing users to determine measures and dimensions and declare definitions programmatically. Some vendors provide both.
- Caching and aggregates: Many platforms make use of materializations and caching to improve query performance; a number of platforms also automatically suggest aggregates—tables of data that have been summarized or aggregated by certain dimensions—for large datasets. This is a way to consolidate large amounts of data. Several platforms automatically suggest which datasets might benefit from aggregates based on user query patterns, while some also provide ways for users to assist in telling the platform to generate aggregates by “hinting” or flagging datasets that would benefit from aggregates or by defining an aggregate table directly.
- Data connections: Situated as it is between data sources and client applications, an essential aspect of the semantic layer is its connections to those data sources as well as to the client applications. While there’s some difference in approach to data source connectivity among the vendors in this category, each vendor’s approach is a deliberate choice designed to provide certain benefits to customers. Some vendors opt for breadth, making their platforms compatible with many different types of databases, while others opt for depth, with fewer but deeper platform-specific integrations. Some vendors find a balance of both. Integration with client applications can be accomplished in a number of ways as well, including ODBC/JDBC interfaces or REST APIs.
How We Got Here
Although semantic layers and metrics stores are enjoying a resurgence of interest, they are not new concepts. The term “semantic layer” has existed since the 1990s, when it was coined by Business Objects (later acquired by SAP). Business Objects’ semantic layer, called a “universe,” referred to an intermediate layer designed to make data understandable to non-technical users. Additionally, most BI tools have long had a localized semantic layer within them that allows measures and dimensions to be defined and stored. However, this type of semantic layer is limited to its own particular tool, so the definitions aren’t reusable across tools and teams.
Semantic layers have their roots in OLAP databases and OLAP cubes. An OLAP database requires source data in a star schema, where measures are defined and stored in one central table known as the fact table, and dimensions are defined and stored in multiple related dimension tables. An OLAP cube is a multidimensional representation of the data that consists of measures broken down by each dimension (hence the term “cube,” referencing a multidimensional structure versus a table, which is only two-dimensional), as shown in Figure 2.
Figure 2. OLAP Cube
Setting up an OLAP cube requires substantial up-front data processing: all dimensions and measures must be defined—sometimes in multiple iterations—and many of the cell values (measure values for specific combinations of dimension members) must be precalculated, all of which requires time and effort. Once that work is done, however, the reward is significant: faster analytics, greater consistency, and improved governance.
Self-service BI evolved as an alternative methodology to OLAP cubes. It removed the need for the upfront processing work involved in setting up OLAP cubes, instead defining metrics on the fly. While this aspect contributed to the appeal of self-service BI, removing the structure and formality also gave rise to the inconsistencies and pain points that are driving the returning interest in semantic layers and metrics stores we’re seeing now.
Today’s semantic layers have evolved significantly, drawing on the advantages of the OLAP methodology as well as new technologies and more powerful data processing engines, such as Apache Spark (as well as proprietary engines), that can handle much larger data volumes than traditional OLAP platforms could.
The return to semantic layers and metrics stores represents at least a conceptual return to the OLAP methodology, even if not a return to OLAP technology. In hindsight, this shows that what the industry really needed was a technology change, not a methodology change, even if it seemed it couldn’t have one without the other.
As mentioned, semantic layers are beneficial for any use case that might require a higher-level, consolidated view of an organization’s data, or that requires data to be understood in common business terms. These include:
- Data democratization: The semantic model captures meaning and context from raw data by means of organizing it in business terms. By doing so, the semantic model makes the data accessible to users of varying technical backgrounds and useful in the quest for business insights.
- Reduced cloud costs: Because the model is predefined and aggregate tables for large datasets may be pre-materialized, the platform can assess whether an incoming query can be filled by any existing aggregate table instead of querying the backend database. If it can, this assists organizations in potentially lowering their cloud computing costs, as it reduces the number of times the cloud database, cloud data warehouse, or other cloud storage system has to be queried.
- Edge computing: For an organization in an industry such as construction, in which data may be coming from far-flung edge sites in different geographic locations or in special formats such as time series data from sensors, a semantic layer can help create a logical, consolidated view of the organization’s data across all its different data types and disparate sources.
3. Considerations for Adoption
Before adopting a semantic layer, customers should understand the benefits and limitations of current offerings, as well as how the market is evolving to accommodate new business needs. Vendors take different approaches to a number of components of the semantic layer platform. Each approach is a deliberate choice designed to give a specific benefit for intended users of the platform. While these approaches are important considerations to keep in mind when choosing a platform, it would be hard for potential customers to choose among the approaches without first understanding what they themselves are looking to get out of a semantic layer platform and why.
Additionally, the vendor landscape in the semantic layer and metrics store category can largely be divided into two main segments. One segment of vendors is characterized by its creation of dedicated semantic layer offerings (its semantic layer platform is its sole or its flagship offering), whereas the other segment offers a semantic layer as part of its overall approach to analytics. Consequently, when comparing the vendors in the semantic layer and metrics store category, the comparison is not a purely like-to-like one and involves a certain degree of comparing apples to oranges.
Potential customers should develop an understanding of what improvements they are seeking and why and then keep the vendor approach in mind when determining which platform best suits their needs. Some factors to consider include:
- Does the customer already have a significant portion of infrastructure with an existing vendor in this list? Would it make sense to remain with that vendor for this solution as well?
- Does the customer have a significant number of technical users or would a platform with a low-code interface suit it best?
- Would the customer prefer deep integrations with underlying data platforms or would perhaps looser connectivity to a broad range of sources suit them better?
- What kinds of applications will end users select to query the model? Will the platform support these tools?
The key characteristics discussed below can assist with guiding this process.
Key Characteristics for Enterprise Adoption
Here, we explore the key characteristics that may influence enterprise adoption of the technology, based on attributes or capabilities that may be offered by some vendors but not others. These criteria will be the basis on which organizations decide which solutions to adopt for their particular needs.
The key characteristics for semantic layers and metrics stores are:
- Data source and client connectivity
- Analytics preprocessing
- Security and access controls
- Federation and virtualization
- Native API support
- Code and developer orientation
Data Source and Client Connectivity
Data source and client connectivity refers to the ability of the solution to connect to many different data sources and BI tools. The more data sources and BI tools a platform can connect to, the more useful it is for creating consolidated views of data and enabling consistency of definitions in calculations and reports across teams and tools. For example, regardless of whether users are connecting to data in sources such as BigQuery, Trino, MongoDB, or an on-premises SQL Server and analyzing said data with BI tools such as Power BI, Tableau, or via Python code in Jupyter notebooks, they all should be working with the same metrics, defined the same way.
While semantic layers are designed to translate data concepts into terms that are more accessible to business users, building the dimensional model and maintaining the semantic layer requires more technical expertise. Vendors take various approaches to designing UIs; for example, some solutions have a graphical drag-and-drop UI for authoring the semantic layer and designing the dimensional model, while other solutions have a code-based interface, and some have a balance of both. Vendors who score high on this metric provide a strong combination of the two. Each approach will suit different organizations depending on the skill sets they have available.
Different optimizations can speed up performance, such as materializations and caching. Some solutions also analyze query patterns autonomously to pre-aggregate often-analyzed data and/or allow users to indicate which items (that is, specific measures, across particular dimensions, sometimes at specific hierarchical levels) should be pre-aggregated to improve performance. Leading vendors possess one or more of these optimization techniques.
Security and Access Controls
Because a semantic layer sits between the data sources and the applications that analyze the data, some semantic layers also assist with and further an organization’s access control policies: in addition to enabling consistency of definitions, semantic layers also enable consistent, organization-wide application of access policies. Some semantic layers can integrate with a company’s existing authentication and identity management tools, such as Azure Active Directory, and enforce role-based access control (RBAC) policies down to the row or column level.
Federation and Virtualization
To create a consolidated view of an organization’s data, some semantic layers support federation of data from disparate sources, sometimes through data virtualization (that is, without moving or materializing the actual data). Some semantic layers also support query pushdown to apply semantic logic to data and/or transform data without moving it.
Native API Support
In addition to a GUI and standard ODBC/JDBC, some solutions support native APIs (implemented using REST, GraphQL, or SQL) for querying data. This compatibility opens the solution up to use with additional applications, tools, and programming languages. Leading vendors provide support for at least one native API.
Code and Developer Orientation
For organizations that have the requisite skills, a code-first solution provides more control and flexibility in building and customizing the semantic model to suit that organization’s needs. Some solutions support multiple programming languages, such as SQL, Python, and MDX, or vendor-specific languages like Microsoft’s Tabular Model Definition Language (TMDL) or Google’s Looker Modeling Language (LookML). Some solutions enable reusability of the specific code employed to generate the definitions of metrics (in addition to an overall reusability of concepts and definitions that a semantic layer enables in general). Solutions with a code-first approach address the “analytics engineer” persona, making them a good fit for more technical audiences.
Table 1 shows how well the key characteristics are implemented in the solutions assessed in this report.
Table 1. Key Characteristics Affecting Enterprise Adoption
|Data Source & Client Connectivity
|Security & Access Controls
|Federation & Virtualization
|Native API Support
|Code & Developer Orientation
|Exceptional: Outstanding focus and execution
|Capable: Good but with room for improvement
|Limited: Lacking in execution and use cases
|Not applicable or absent
4. GigaOm Sonar
The GigaOm Sonar provides a forward-looking analysis of vendor solutions in a nascent or emerging technology sector. It assesses each vendor on its architecture approach (Innovation) while determining where each solution sits in terms of enabling rapid time to value (Feature Play) versus delivering a complex and robust solution (Platform Play).
The GigaOm Sonar chart (Figure 3) plots the current position of each solution against these three criteria across a field of concentric semicircles, with solutions set closer to the center judged to be of higher overall value. The forward-looking progress of vendors is further depicted by arrows that show the expected direction of movement over a period of 12 to 18 months.
Figure 3. GigaOm Sonar for Semantic Layers and Metrics Stores
As you can see in the Sonar chart in Figure 3, the majority of vendors are found in the Leaders band, which attests to the strong, comprehensive nature of these vendors’ offerings and highlights one of the key points about this industry: while semantic layers and metrics stores are enjoying a resurgence of interest, these are not entirely new concepts. The underlying technology has been around for a number of decades and has been evolving and transforming all along. Today’s semantic layers are emerging as distinct components of the modern data stack, and the delineations between semantic layers and adjacent categories are being clarified. Today’s semantic layers also include features and technologies such as more powerful query engines that bring them squarely into the modern era and include code-first interfaces and markup languages that seek to address the emerging analytics engineer persona and bring software engineering best practices to data modeling.
There is a certain degree of comparing apples to oranges when considering the vendor landscape in this space, as some vendors provide dedicated semantic layer offerings, whereas others offer a semantic layer as part of an overall approach to analytics.
Lastly, one final point about this graphic is that it represents a snapshot in time, capturing a moment when some vendors are in a state of transition. A few vendors are working on new developments for their offerings, which have been announced but are currently in preview. The degree of investment that these vendors have put into their upcoming features and integrations demonstrates their commitment to refining their offerings and highlights the potential for increasingly sophisticated features down the line.
5. Vendor Insights
AtScale was founded in 2013 to address the challenges of modernizing OLAP, to bring its advantages and enable it to perform at scale on modern cloud infrastructure. Today, AtScale offers a universal semantic layer solution that blends the structural optimizations of OLAP with the elasticity and scalability of the modern cloud data platform.
AtScale’s Universal Semantic Layer solution functions as an independent layer between underlying data in its native repositories and BI/analytics tools, providing a centralized location from which business definitions can be governed and managed. It makes use of data virtualization to present an abstracted, consolidated, and logical view of data across an organization without physically moving the data, and it leverages the federation capabilities of the underlying data platforms to unify queries across physically separate data sources.
The AtScale solution supports deep integrations with data sources including Snowflake, Databricks, Amazon Redshift, Google BigQuery, Microsoft Azure Synapse Analytics, Postgres, Oracle, and Microsoft SQL Server. AtScale implements dialect-specific SQL for each of its supported data sources, pushing down computations to the underlying data platform. On the client side, AtScale provides native protocol integrations for client applications to connect to its solution: these include SQL for Tableau, Looker, and Qlik; MDX for Excel, Cognos, Microstrategy, and SAP BusinessObjects; DAX for Power BI; Python for Jupyter Notebooks and other data science applications and REST interfaces. AtScale also offers connections to data catalog tools including Alation and Collibra.
The AtScale UI provides both no-code and code-first approaches for building the semantic model, making its solution accessible to, and accommodating of, multiple different user personas. AtScale includes a graphical drag-and-drop interface for constructing models, which the vendor says is targeted to BI modelers and business analyst personas. In April 2023, AtScale also announced the AtScale Modeling Language (AML), in preview, for code-first data modeling, which the vendor says addresses the analytics engineer, data scientist, and developer personas who prefer a code-first approach to model building. AML also allows AtScale customers to use Git for version control and CI/CD practices.
Other features of AtScale’s solution that support a code-first approach to data modeling include an object library that allows data modelers to share and reuse model assets; support for dbt metric definitions, wherein AtScale can connect to a dbt project Git repository and access metrics defined as dbt metrics directly from there; and AtScale AI-Link, which allows customers to create, access, and manage AtScale data models using Python.
AtScale provides several features for query acceleration, including the ability to create, update, and/or merge aggregate tables, both autonomously and in a user-assisted or user-defined manner. The system can automatically observe user behavior and query patterns and create aggregate tables on its own to accelerate often-queried items. Data modelers can also “hint” or flag dimensions and measures to indicate to the system that it should create an aggregate from these items, and modelers can also create a user-defined aggregate for the platform to materialize immediately upon publishing the data model without waiting for the system to track user behavior or query patterns. Users can pin any frequently used aggregates in memory on the AtScale server as well.
AtScale can integrate with a wide range of authentication and identity provider solutions, including Azure Active Directory, lightweight directory access protocol (LDAP), and Okta. AtScale provides RBACs at the column and row level.
Strengths: AtScale’s strengths include its deep data source and client connectivity, multi-persona approach to UI and platform orientation, and both autonomous and user-assisted or user-defined configuration of aggregates.
Challenges: As a major enterprise-focused player in the semantic layer and metrics store category, AtScale’s offerings are designed to handle enterprise workloads and are geared more toward these customers. As such, they may be out of range for smaller businesses.
Cube was founded in 2019 as an open source project and is a newer entrant in this space. Cube offers a semantic layer solution that is available either as a cloud-native managed service called Cube Cloud or as a Docker-containerized service called Cube Core that is run on the customer’s own infrastructure. Cube supports connectivity to data sources that use SQL to access data, including cloud data warehouses, query engines, transactional databases, time series databases, and streaming data sources.
Cube provides a REST API, GraphQL API, SQL API, and “Orchestration API” for connecting to client applications. Through these APIs, Cube supports connectivity to clients including BI applications, data science notebooks and workspaces, AI platforms, data catalog platforms, and data orchestration tools. The SQL API provides a Postgres-compatible dialect of SQL for client tools that use Postgres as a data source; the vendor says this API is most commonly used to connect to BI tools and applications. The Orchestration API was released in June 2023, and it enables Cube to connect to data orchestration tools such as Apache Airflow, Prefect, and Dagster, allowing these data orchestration tools to push changes to Cube from upstream data sources (instead of having Cube pull changes).
Cube also provides ways to view and interact graphically with the data model, including a GUI and an entity-relationship diagram, called a data graph, which displays a visual representation of the data model and the relationships between components of the model.
Cube provides several optimizations for analytics pre-processing and query acceleration. Cube makes use of caching and pre-aggregations to optimize system performance. The in-memory cache stores query results and helps improve performance when multiple concurrent queries are being run. Pre-aggregations provide another way the solution optimizes query performance. Pre-aggregations in Cube are a condensed version of the source data grouped by specified aggregates and can be stored either in the source database, in an external database supported by Cube, or in Cube Store, which the vendor describes as its own dedicated pre-aggregation storage layer. Pre-aggregations can be defined by users with a code-first approach as well as through a visual interface called Rollup Designer, which also automatically suggests pre-aggregations that users can select to apply.
Cube provides RBACs and integrates with identity providers such as Okta, Google Workspace, and Azure Active Directory for single sign-on.
Strengths: Cube’s strengths include its strong code-first orientation, native API support, and its analytics pre-processing through caching and pre-aggregations.
Challenges: As noted, Cube is the newest vendor in this report. It is still carving out a niche for itself, although to its credit, it has risen to prominence within the vendor landscape quickly and is bolstered by its strong open source community.
The data build tool, dbt, was founded on the idea that analysts can and should be working like software engineers. To that end, dbt is a code-first offering that provides a framework for writing analytics code in SQL while making use of software engineering best practices, such as modularity, version control, documentation, and testing. Dbt allows developers to centralize the definitions of metrics within dbt projects and ensure consistent definitions of metrics are being used across all teams and tools.
Users define dbt metrics in YAML files within a dbt project, referencing an underlying dbt model, providing descriptions, and adding dimensions and filters. Once a metric has been created, it automatically populates in the directed acyclic graph, or DAG, which is a visual representation of the relationships between data models in dbt. This allows users to view upstream and downstream dependencies. Users can preview what a metric looks like before defining it in a project. Users can query metrics in dbt and verify them before running a job in the deployment environment.
The dbt Semantic Layer, launched into public beta in August 2023, is described by the vendor as a combination of several underlying components that create a consolidated experience for interacting with and querying data in the context of business metrics. These components include the metrics node within dbt Core and MetricFlow, the underlying query generation engine. Also included is the Semantic Layer Gateway–which enables dbt Cloud to connect to and relay queries in SQL between databases or data warehouses and client applications–and MetricFlow Server, which, according to the vendor’s documentation, “takes metric requests and generates optimized SQL for the specific data platform.”
The dbt Semantic Layer is currently available in public beta. Semantic layer integrations allow querying dbt metrics from partner tools, including Mode, and a forthcoming integration with Hex.
The dbt Semantic Layer underwent some evolution during the writing of this report and immediately prior to its publication. The vendor notes that the dbt metrics package, which is how metrics had historically been used and defined in dbt, has been deprecated in favor of capabilities derived from MetricFlow: dbt Labs acquired Transform Data in 2023 and integrated Transform’s MetricFlow semantic layer capabilities into the dbt Semantic Layer, which the vendor says brings support for joins across tables, expanded data platform support, native support for more complex metrics types, more optimized query plans and SQL generation, a JDBC interface and a GraphQL API, quality checks and validation, and more capabilities further down the line such as caching and a REST API. These capabilities were made available in beta starting in Q3 2023.
Although the dust is still settling from this acquisition, the integration of MetricFlow’s capabilities means that the dbt Semantic Layer is poised to make a great leap in terms of sophistication and capabilities. The vendor notes it is committed to supporting its customers through the transition with tools and mitigation guides. What’s evident at the present time is that the vendor is investing in and growing its semantic layer capabilities and there is potential for increasingly sophisticated features to be introduced and developed to benefit users.
Strengths: Dbt’s strengths include its source/client connectivity and its robust code/developer orientation.
Challenges: As noted, dbt’s Semantic Layer is currently in a period of transition, with the features derived from dbt’s acquisition of Transform Data/Metric Flow now in beta. Once the transition is completed, dbt’s semantic layer capabilities are poised to be enhanced considerably, but as of now, that transition is still ongoing.
Looker is a BI solution that possesses a SQL-based modeling language for creating semantic data models, called LookML. Looker was acquired by Google in 2019 and that acquisition was completed in 2020. On the front end, Looker enables users to perform analytics and visualizations, and on the back end, users can define measures and dimensions and create data models from disparate sources using LookML.
LookML Projects, which contain all the files detailing the objects, database connections, and UI elements used for querying, reside in individual Git repositories, enabling version control. To create a model, users can either start with a blank project and manually create the model, or they can automatically generate a model from a database’s schema. Views contain the definitions of measures and dimensions. One view is called an Explore, which Google’s product documentation describes as a starting point for a query. Users can define the tables joined in an Explore when declaring the Explore.
Looker supports connections to a wide variety of data sources, including relational databases, data warehouses, data lakes and lakehouses, query engines, MySQL databases, PostgreSQL databases, and on-premises data stores. Specific examples include Amazon Athena, Amazon Redshift, Apache Hive, Apache Spark, Cloudera Impala, Databricks, Dremio, Google BigQuery, IBM DB2, Microsoft Azure Synapse Analytics, Oracle, Presto, and Microsoft SQL Server. The solution includes deep integrations with Looker Studio (formerly Google Data Studio), Google Connected Sheets, Tableau, and Power BI. The company says Looker remains an open solution through its Looker API, a REST API, and that it continues to expand its connectors to other BI tools.
The integrated development environment (IDE) provides features including automatic suggestions of possible parameters and values, automatic formatting, find and replace, and syntax error checking (error checking of the LookML models is provided through the content validator). Looker provides a development mode and a production mode for the Looker data model: in development mode, LookML files are editable, and users can preview the effects of any changes; in production mode, all users access their projects in the same state and project files are read-only.
Portions of LookML code can be reused via the “extends” parameter, which indicates that an object, view, Explore, or dashboard is an extension of an existing item. Additionally, elements called Looker Blocks consist of pre-built data models for common analytical patterns and data sources. Users can browse the Looker Marketplace to find available Looker Blocks, which include full existing LookML models as well as templates (pre-written code) for data cleansing, transformation, and analytics.
Looker makes use of caching, accessing the stored results of prior SQL queries, to improve performance. What the vendor calls “aggregate awareness” allows the solution to improve efficiency by identifying the smallest table available to run a query accurately. Developers can define aggregates or summaries of the data in larger tables grouped by different attributes, which Looker can then use instead of the larger table for queries. The LookML Diagram, an entity-relationship diagram of a LookML model, provides a visual representation of the relationships among LookML objects in a model, which can be used to review and consolidate the model where possible to improve performance.
Administrators can manage the actions a user can perform, and the datasets and fields within a dataset that a user can access, by assigning a role to a user or group of users, and managing permissions based on the assigned role. User experience can also be customized through user attributes (such as location, ID, timezone, or a custom attribute) which are defined by an administrator and applied to an individual user or group of users.
Additionally, in March 2023, Google announced that Looker will be offering the ability for customers to use the solution as a standalone metrics layer, decoupling LookML’s query engine from the BI layer and allowing the metrics defined within Looker’s modeling layer to be accessed across tools with which it currently has integrations (as of the writing of this report, these are Google Connected Sheets, Looker Studio, Looker Studio Pro, Microsoft Power BI, Tableau, and ThoughtSpot). In addition to these integrations, the vendor also says that there will be a new SQL interface that allows JDBC connections to Looker’s modeling layer from other tools.
Strengths: Looker’s strengths include its code-based modeling with LookML, reusability of code and models through extensions and Blocks, and its code-first user experience.
Challenges: Looker is not a standalone semantic layer; its upcoming standalone offering, Looker Modeler, was announced in March 2023, but it is not yet generally available. Once that offering becomes generally available, it will mitigate this challenge significantly.
Microsoft is a longstanding player in the semantic layer category and its offerings and capabilities there have been developed and refined since the early days of BI. Microsoft’s BI and semantic layer journey has evolved from the original SQL Server Analysis Services (SSAS) Multidimensional mode to the SSAS Tabular Mode, Power BI, and the newly announced Microsoft Fabric (currently in preview). The vendor’s trajectory encompasses what it refers to as “discipline at the core and flexibility at the edge,” enabled by features such as the Power BI composite model, which allows definition and management of a core semantic model to be handled by IT, while maintaining flexibility at the “edge” for analysts to still perform analytics in a more exploratory manner.
SSAS provides different approaches for creating semantic models. These include the multidimensional and tabular modes. Multidimensional modeling in SSAS makes use of OLAP modeling constructs (measures, dimensions, and cubes). It features a visual drag-and-drop interface with wizards to define the elements of the cube. The Analysis Services Scripting Language, ASSL, allows construction and modification of the model in a code-first manner. The multidimensional model makes use of analytics preprocessing and optimization features, including caching and both automatic and manually designed aggregations.
When SSAS runs in ROLAP or HOLAP mode (that is, Relational OLAP or Hybrid OLAP modes, where fact data and dimension data are not materialized in the cube), it behaves very much like the semantic layer pure play offerings from other vendors in this report. Although MOLAP mode (multidimensional OLAP mode, where fact data and dimension data are materialized in the cube) is more common due to performance considerations, the ROLAP mode is well-supported and has existed since the late 1990s. The new vendors in this report are essentially rediscovering that architecture 25 years later.
The tabular model used by SSAS, Excel Power Pivot and Power BI, can run in an in-memory Import mode or make use of an option called DirectQuery mode, which connects to and queries the backend database on the fly. (Import and DirectQuery are tabular’s rough equivalents of MOLAP and ROLAP mode, respectively.) The vendor says this offloading of query execution to the back-end database enables building a larger scale Tabular model because the database does not have to be loaded in memory.
Scripting and modeling languages supported include the aforementioned ASSL for the multidimensional mode and TMSL/TMDL (Tabular Model Scripting Language/Tabular Model Definition Language) for the tabular mode. In addition, MDX (MultiDimensional eXpressions) and DAX (Data Analysis eXpressions) are available for querying, building calculations, and defining measures in the multidimensional and tabular modes, respectively. XMLA (XML for Analysis) is a SOAP web service that Analysis Services uses as its native protocol. Though developed by Microsoft, MDX and XMLA have become industry standards used by multiple vendors.
Power BI XMLA endpoints allow that solution to connect to and be queried by SSAS-compatible applications and tools from Microsoft and third-parties alike. The more than 180 data sources to which it can connect include cloud data warehouses, time series databases, files, and SaaS applications and solutions. Though Power BI provides a visual drag-and-drop interface for designing data models, code-first construction of the model is also possible through mastery of TMSL/TMDL and their accompanying APIs.
The Power BI import mode allows caching of data; composite models (tabular’s rough equivalent of HOLAP) enable DirectQuery mode to be used instead, while still allowing the use of local aggregation tables, which can be automatically created based on user query patterns or designed explicitly by the modeler. Power BI allows RBAC to be applied at the dataset, object, and row levels, and it makes use of Azure Active Directory for user authentication. The Power BI composite model allows multiple tables in either import mode or DirectQuery mode to be contained within a dataset, and in some cases, individual tables can leverage both storage models.
In the Microsoft Fabric preview, previously separate cloud services, including Power BI, Azure Synapse Analytics, and Azure Data Factory, are abstracted, enhanced, and unified and operate on top of a single, consolidated data store called “OneLake,” where data is persisted in Apache Parquet/Delta Lake format. Once Fabric is generally available, the intertwining of all these offerings will result in a more seamless experience and one that is easier for the business user to consume.
Strengths: Microsoft has been a major player in the semantic layer category for decades. Its Power BI platform is a powerhouse that is poised to provide even greater benefit to its users once Microsoft Fabric becomes generally available.
Challenges: Microsoft does not position itself as a code-first semantic layer solution, although code-first modeling is indeed supported. Microsoft’s recent investments in the “developer mode” within Power BI Premium stand to improve its capabilities and their visibility in this area. In addition, Microsoft’s modern BI stack is backward-compatible with the original SSAS and Azure Analysis Services APIs tooling and ecosystem which, together, constitute a predominantly code-first solution.
Oracle has several options in the semantic layer and metrics store category. These include the Oracle Analytics Cloud’s Semantic Modeler and Model Administration Tool, two tools with different experiences for building semantic models. Additionally, there is Oracle Database’s Analytic Views. According to the vendor, Semantic Modeler and the Model Administration tool are best for use cases in which the semantic layer functions as a “middle-tier” between data storage and client BI applications.
Meanwhile, for deep, native data source integration, along with powerful dimensional modeling, the vendor provides Oracle Analytic Views within Oracle Database. The semantic model in Oracle’s solutions federates different physical data sources into a logical model of the data, which can be further structured into subject areas for business users to pull into their analysis and visualizations. Oracle refers to these components as the physical layer, the logical layer, and the presentation layer. The physical layer is where the data sources and relationships among data sources are defined. The logical layer is where the mappings from the physical tables to the logical tables are created and where the measures, dimensions, and hierarchies are defined. In the presentation layer, the logical layer’s objects are organized into subject areas for business users to interact with for business domain-specific visualization and analysis.
Released in October 2022, Oracle Analytics Semantic Modeler, a web browser-based tool for creating semantic data models, is the newest of the technology giant’s semantic layer offerings, designed to create a more modern experience for data modeling. Developers can create data models visually through the Semantic Modeler UI, or directly with code through the JSON-based Semantic Modeler Markup Language (SMML). Semantic Modeler provides a built-in editor for SMML as well as support for the developer’s external text editor of choice. It integrates with Git-based source code control systems. Search integration and a lineage viewer allow users to view the relationships and mapping between components of the semantic model. Semantic Modeler provides RBACs and administrators are able to set other permissions constraints, including object permissions and query limits. Connectivity to clients is provided through ODBC interfaces via APIs (including native APIs for Oracle databases).
At the time of this writing, Semantic Modeler offered support for creating models from relational data sources only, with support for other types of data sources to be supported in future releases. The vendor’s guidance recommends using the Model Administration Tool for non-relational data sources. The documentation describes the Model Administration Tool as a longstanding, developer-focused modeling tool for building governed data models. It is not integrated into the Oracle Analytics interface; rather, it is a separate application that is downloaded and installed onto the user’s computer. The Model Administration Tool allows users to download, edit, and upload semantic model .rpd files to Oracle Analytics Cloud.
Analytic Views are a feature within Oracle Database that allow the creation of a dimensional model, defined using SQL DDL, that is separated from the underlying storage and can be queried by external client applications, including native integration with Oracle Cloud Analytics as well as third-party BI applications such as Tableau. Data sources can include Oracle Database, object stores such as Amazon S3, and other non-Oracle databases and data sources through Oracle Heterogeneous Data Services. The model includes dimensions, hierarchies, levels, attributes, base measures, and calculated measures, and it can be queried using MDX as well as SQL. Analytic Views can pre-calculate aggregations either automatically based on analysis of user query patterns or through indication by the user. Analytic Views possess RBACs at the database object or row-level. External tools and applications can connect to Oracle Analytic Views through ODBC/JDBC interfaces, OLE DB for OLAP, and Oracle REST Data Services.
Strengths: The strengths of Oracle’s offerings include the strong code and developer orientation of Semantic Modeler and the analytics preprocessing capabilities of Oracle Analytic Views.
Challenges: Each individual offering can connect to a specific set of data sources—Semantic Modeler to relational data sources, Oracle Analytic Views primarily to Oracle Database and object stores, and the Model Administration Tool to other non-relational sources. The decision tree presented by this trinity of offerings means prospective customers must evaluate each one carefully in order to select the best combinations for their needs.
As mentioned in the introductory sections of this report, Business Objects (BObj) established the term “semantic layer” in the early 1990’s with its Business Objects “universe,” a semantic layer designed to sit between data in storage and an analytics or reporting tool on the front end. In 2007, Business Objects was acquired by SAP, and its portfolio of technology was integrated into SAP’s own data and analytics stack. The groundbreaking Business Objects universe still exists today within SAP’s BusinessObjects BI suite. Additionally, SAP Datasphere is the vendor’s cloud services platform that delivers a variety of analytics capabilities, including semantic layer technology that has a certain provenance, ancestry, and rooting in the Business Objects universe technology that was previously acquired.
SAP Datasphere allows metrics to be defined in the data and business layer. A data catalog provides a central repository for metadata and is integrated with SAP Analytics Cloud, allowing end-to-end lineage as well as visibility into lineage of changes for metrics. SAP Datasphere provides a no-code/low-code graphical interface as well as code-first interfaces with support for SQL and Python programming languages. SAP Datasphere allows data to be persisted on all layers of the semantic model through local tables by means of data replication in real-time or through snapshots. Materialized views allow improved performance for accessing data and reduce the need to query the back-end source system. The SAP Datasphere Analytic Model makes use of smart caching to optimize query runtimes. Connections available with SAP Datasphere include SAP applications, SAP databases including SAP BW & SAP BW/4HANA, data lakes, and object storage, including SAP HANA Data Lake, Google Cloud Storage, Azure Blob Storage, and Amazon S3 as well as connectivity to external BI client applications via SQL or OData.
With regard to the Business Objects universes that exist today under the SAP BusinessObjects BI suite, users can create them via a design environment called the “information design tool” or IDT. From this environment, the vendor says, users can extract metadata and define dimensions, measures, and hierarchies from relational and OLAP sources. All the components of the universe are contained in a workspace known as a “project.” A “connection” is a set of parameters that defines how a universe can access a relational or OLAP data source, and a universe can have one or more connections to data sources.
A “data foundation” enables federation of the data sources; it consists of a schema that includes the tables from one or more databases and the joint relationships between them. Users can build on this foundation by adding derived tables, alias tables, calculated columns, additional joins, and other SQL definitions. The data foundation provides the basis for what the vendor calls the “business layer,” which maps the measures, dimensions, hierarchies, attributes, and predefined conditions, through SQL or MDX, to the underlying data sources, enabling the latter to be abstracted and presented to the user in business terms. When the mappings in the business layer are complete, it can be compiled with the associated connections and data foundations and deployed as a universe.
Query performance can be accelerated through use of “aggregate awareness,” which enables the universe to make use of pre-aggregated data (contained in aggregate tables). Aggregate awareness enables the query generator to retrieve the data from the table with the highest aggregation level that accommodates the level of detail in the query at runtime.
Administrators can create data security profiles and secure access to objects and data returned in queries through use of these profiles. A data security profile is a group of settings that define the permissions to objects and data connections within a published universe. Administrators can assign data security profiles to users and groups of users. More than one profile can be assigned to a user or group, and admins can preview the net result of the profiles assigned to any user or group.
Strengths: The original Business Objects universe was a groundbreaking invention for its time and a major milestone in the history of the semantic layer category, which many vendors still, in effect, emulate today. The fact that it still exists attests to its strength and to the continued investment SAP has put into it. A major strong point of the SAP BusinessObjects universe is its UI for building the model and deploying the universe.
Challenges: Corresponding with its own vintage, the set of data sources the SAP BusinessObjects universe offering can connect to includes only SAP data sources and the OLAP pioneers of Oracle Essbase and Microsoft SQL Server Analysis Services.
6. Near-Term Roadmap
The “modern data stack” is a term that has become prominent in recent years, and it refers to an assortment of data tools and technologies that span data collection, movement, storage, transformation, processing, and analysis. Since their inception, semantic layers have been components of this collection, but they are often found within another technology of the stack—localized either within a BI tool or within a database–or through implementing semantic logic in data transformations in the ETL/ELT process.
Now, decades after its original development, the semantic layer is rising to prominence again, in large part as a response to the shortcomings of self-service BI, as noted in the Overview section of this report. The difference now is that semantic layers are being recognized as discrete components of the portfolio of technologies that make up the so-called modern data stack, distinct from BI platforms, data integration platforms, data virtualization platforms, and data transformation tools.
In general, the majority of semantic layer offerings on the market today provide a layer decoupled from data sources and from BI and analytics tools and situated between the two. For the most part, we anticipate this scenario to endure, based on vendors’ upcoming releases, adding support for modeling languages that allow their semantic models to be queried independently.
By means of the semantic model, semantic layers consolidate the view of the organization’s data and capture business entities and analysis terms. Thus, the designated audience of the semantic layer isn’t limited to a specific subset of personnel with certain skill sets; it’s actually the entire organization, regardless of individuals’ technical ability. This is playing out in the vendor landscape in an interesting way: at the same time that vendors are adding support for modeling languages to support the emergence of the analytics engineer persona who prefers a code-first approach, they are also adding more low-code/no-code features, designed to make interacting with the solution and model, and defining metrics, simpler for non-technical users. We expect these dual trends to continue as semantic layer solutions evolve to further encompass both technical and non-technical user personas.
7. Analyst’s Take
The resurgence of interest in semantic layers and metrics stores represents, in many ways, a full-circle return to a concept that was originally developed decades prior. Yet, industry professionals seeking to investigate whether a semantic layer is a good fit for their organization’s goals should not shy away from the connotations surrounding terms such as OLAP and cube. The industry is rediscovering the benefits that defining the semantic model up front can have on ease of adoption and consumption, query performance and speed, consistency, data democratization, and governance.
Semantic layers are benefiting from technologies and features that bring them squarely into the modern era. Code-first interfaces and markup languages seek to address the emerging analytics engineer persona and bring software engineering best practices, such as version control, to data modeling. Additionally, today’s semantic layers are benefiting from powerful improvements in query engine technology. The modern semantic layer solution represents the best of both worlds, in terms of combining the advantages of the dimensional modeling approach with improved technology and features that have developed in the ensuing years.
Semantic layers are also being recognized as independent components of organizations’ portfolios of data technologies. Vendors are clarifying the delineations between semantic layers and adjacent categories such as data virtualization and transformation tools. By doing so, they’re clarifying how a semantic layer can harmonize and work in tandem with other components of the stack to help organizations achieve their goals.
Our intent in this Sonar report is to assist readers with understanding the history and context of the semantic layer, and the details of the technology involved and how it works, in addition to providing key characteristics for evaluating existing semantic layer offerings. To that end, we have walked through an overview detailing the purpose, technology fundamentals, main components, and history of the semantic layer; provided the key characteristics that potential customers can use to identify what’s most important to them and their organization’s goals; and included a survey of the current vendor landscape. This information is intended to help potential customers see how a semantic layer would fit into their own stack and arm them with the knowledge ultimately needed to select the offering that best matches their organization’s circumstances, use cases, and requirements.
8. Report Methodology
A GigaOm Sonar report analyzes emerging technology trends and sectors, providing decision-makers with the information they need to build forward-looking—and rewarding—IT strategies. Sonar reports provide analysis of the risks posed by the adoption of products that are not yet fully validated by the market or available from established players.
In exploring bleeding edge technology and addressing market segments still lacking clear categorization, Sonar reports aim to eliminate hype, educate on technology, and equip readers with insight that allows them to navigate different product implementations. The analysis highlights core technologies, use cases, and differentiating features, rather than drawing feature comparisons. This approach is taken mostly because the overlap among solutions in nascent technology sectors can be minimal. In fact, product implementations based on the same core technology tend to take unique approaches and focus on narrow use cases.
The Sonar report defines the basic features that users should expect from products that satisfactorily implement an emerging technology, while taking note of characteristics that will have a role in building differentiating value over time.
In this regard, readers will find similarities with the GigaOm Key Criteria and Radar reports. Sonar reports, however, are specifically designed to provide an early assessment of recently introduced technologies and market segments. The evaluation of the emerging technology is based on:
- Core technology: Table stakes
- Differentiating features: Potential value and key criteria
Over the years, depending on technology maturation and user adoption, a particular emerging technology may either remain niche or evolve to become mainstream (see Figure 4). GigaOm Sonar reports intercept new technology trends before they become mainstream and provide insight to help readers understand their value for potential early adoption and the highest ROI.
Figure 4. Evolution of Technology
9. About Andrew Brust
Andrew Brust has held developer, CTO, analyst, research director, and market strategist positions at organizations ranging from the City of New York and Cap Gemini to GigaOm and Datameer. He has worked with small, medium, and Fortune 1000 clients in numerous industries and with software companies ranging from small ISVs to large clients like Microsoft. The understanding of technology and the way customers use it that resulted from this experience makes his market and product analyses relevant, credible, and empathetic.
Andrew has tracked the Big Data and Analytics industry since its inception, as GigaOm’s Research Director and as ZDNet’s original blogger for Big Data and Analytics. Andrew co-chairs Visual Studio Live!, one of the nation’s longest-running developer conferences, and currently covers data and analytics for The New Stack and VentureBeat. As a seasoned technical author and speaker in the database field, Andrew understands today’s market in the context of its extensive enterprise underpinnings.
10. About GigaOm
GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.
GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.
GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.