This GigaOm Research Reprint Expires Mar 24, 2024

GigaOm Radar for Data Pipelinesv2.0

1. Summary

Today’s organizations are grappling with a growing volume of data—data of various types and formats that are stored in an increasing number of repositories. Somehow, organizations need to ensure the quality and manage the flow of all this data throughout their company so they can successfully derive valid insights from it. Data pipelines offer a solution to this challenge: taking source data, transforming it from its source format into a uniform format suitable for its target system, and loading it into a target database or storage system for downstream consumption and analysis.

Building a data pipeline from scratch is time-consuming and code-intensive, but the market now offers a broad array of solutions for moving and processing data. These products streamline and automate the process from start to finish, facilitating the authoring and creation of pipelines, as well as the overall orchestration, the movement, transformation, and monitoring of data, and error handling. Many of them do so through no-code interfaces, allowing even data novices to be involved.

Data pipeline solutions have their origins in traditional extract, transform, and load (ETL) tools, but the category has evolved beyond that approach to encompass additional frameworks and types of pipelines. The category is robust, mature, and stable but by no means stagnant. Innovation continues in the form of ever-widening arrays of data source connections, more refined tools for data transformations, platforms that support a variety of deployment scenarios, and improved security features.

This GigaOm Radar report highlights key data pipeline vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating Data Pipeline Solutions,” we describe in more detail the key features and metrics that are used to evaluate vendors in this market.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

2. Market Categories and User Segments

To better understand the market and vendor positioning (Table 1), we assess how well data pipeline solutions are positioned to serve specific market segments and user groups.

For this report, we recognize the following market segments:

  • Small-to-medium business (SMB): In this category, we assess solutions on their ability to meet the needs of organizations ranging from small businesses to medium-sized companies. Also assessed are departmental use cases in large enterprises, where ease of use and deployment are more important than extensive management functionality, data mobility, and feature set.
  • Large enterprise: Here, offerings are assessed on their ability to support large and business-critical projects. Optimal solutions in this category have a strong focus on flexibility, performance, data services, and features to improve security and data protection. Scalability is another big differentiator, as is the ability to deploy the same service in different environments.
  • Specialized: Optimal solutions are designed for specific workloads and use cases, such as big data analytics and high-performance computing (HPC).

In addition, we recognize four user groups for solutions in this report:

  • Business user: Business users are typically beginners in the realm of data and analytics. While these employees may occasionally need to use analytical tools to perform self-service exploration and analysis, they rely on others to handle the technical aspects of configuring and provisioning them.
  • Business analyst: These users have some knowledge of data analysis tasks and are familiar with using self-service tools to perform analytics. They evaluate data from the perspective of deriving business insights and making recommendations for improvements, such as better performance or cost reductions.
  • Data analyst: These users review data to look for trends and patterns that can benefit organizations at the corporate level. While not as technical as data engineers, data analysts possess knowledge of data preparation, visualization, and analysis that can be applied to inform organizational strategy.
  • Data engineer: Data engineers are very well versed technically, and they apply their specialized knowledge to helping prepare, organize, and model data, transforming it into actionable information for the organizations they support.

Table 1. Vendor Positioning

Market Segment

User Segment

SMB Large Enterprise Specialized Business User Business Analyst Data Analyst Data Engineer
Alteryx
Ascend
AWS
dbt Labs
Fivetran
Google Cloud
Hitachi Vantara
IBM
Informatica
Matillion
Microsoft
Qlik / Talend
SAP
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

3. Key Criteria Comparison

Building on the findings from the GigaOm report, “Key Criteria for Evaluating Data Pipeline Solutions,” Table 2 summarizes how each vendor included in this research performs in the areas we consider differentiating and critical in this sector. Table 3 follows this summary with insight into each product’s evaluation metrics—the top-line characteristics that define the impact each will have on the organization.

The objective is to give the reader a snapshot of the technical capabilities of available solutions, define the perimeter of the market landscape, and gauge the potential impact on the business.

Table 2. Key Criteria Comparison

Key Criteria

Hybrid & Multicloud Capabilities Managed Services Offerings Multimodal Data Pipelines Data Management Extended Data Transformation Native Source & Destination Integration
Alteryx 2 2 2 3 3 3
Ascend 2 3 3 3 2 2
AWS 0 3 3 3 3 2
dbt Labs 2 2 0 3 3 2
Fivetran 2 3 2 2 2 2
Google Cloud 1 3 2 2 2 2
Hitachi Vantara 2 2 2 3 2 3
IBM 2 2 2 2 3 2
Informatica 3 3 3 3 3 3
Matillion 2 3 2 2 2 3
Microsoft 1 3 2 2 3 3
Qlik / Talend 2 3 3 3 2 3
SAP 2 3 2 3 3 3
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

Table 3. Evaluation Metrics Comparison

Evaluation Metrics

User-Friendliness Connectivity Security Reusability Extensibility & Ecosystem
Alteryx 3 3 3 3 2
Ascend 3 3 3 2 2
AWS 3 2 2 3 2
dbt Labs 3 2 3 3 3
Fivetran 3 3 3 2 3
Google Cloud 3 2 2 3 2
Hitachi Vantara 3 3 2 2 2
IBM 3 3 2 2 2
Informatica 3 3 3 3 3
Matillion 3 3 3 3 2
Microsoft 3 3 2 3 3
Qlik / Talend 2 3 3 2 3
SAP 3 3 2 3 2
3 Exceptional: Outstanding focus and execution
2 Capable: Good but with room for improvement
2 Limited: Lacking in execution and use cases
2 Not applicable or absent

By combining the information provided in the tables above, the reader can develop a clear understanding of the technical solutions available in the market.

4. GigaOm Radar

This report synthesizes the analysis of key criteria and their impact on evaluation metrics to inform the GigaOm Radar graphic in Figure 1. The resulting chart is a forward-looking perspective on all the vendors in this report, based on their products’ technical capabilities and feature sets.

The GigaOm Radar plots vendor solutions across a series of concentric rings, with those set closer to the center judged to be of higher overall value. The chart characterizes each vendor on two axes—balancing Maturity versus Innovation, and Feature Play versus Platform Play—while providing an arrow that projects each solution’s evolution over the coming 12 to 18 months.

Figure 1. GigaOm Radar for Data Pipelines

As you can see in the Radar chart in Figure 1, the majority of the vendors in this report are found in the Innovation half of the Radar. This attests to the overall trends and pace of development in the data pipeline category. Although it has its roots in traditional ETL tools that have been around for many decades, the category continues to evolve to keep pace with the demands of modern enterprises.

The ETL process has had a fair amount of time to be refined, so it’s no surprise that a number of vendors who pioneered and/or specialize in ETL are in the Leaders circle of this Radar. Other frameworks, most notably ELT, have also evolved to address the challenges resulting from the sheer volume of data that enterprises are grappling with, by leveraging the decoupled storage and compute power of the target systems, often a data warehouse, to apply transformations to the raw data once it has been loaded. ELT is a newer framework and has rapidly gained prominence in the industry, and therefore, it’s also no surprise that many of the vendors who specialize (although not exclusively) in ELT are Outperformers poised to enter the Leaders circle in the near term.

In this Radar, there are a handful of vendors in the Leaders circle, all of whom have been long-standing players in the industry. Their presence in the Leaders circle signifies the comprehensive nature of their platforms and their strong use of the key criteria and evaluation metrics we evaluate as part of their data pipeline offerings. In addition, the majority of the Leaders have implemented one or both of the emerging technologies in this landscape, indicating that they remain on the cutting edge of development in the industry; there appears to be no risk of stagnation here.

Several other vendors are poised to enter the Leaders circle in the short term. This is due to the fast-growing and well-rounded nature of their platforms. As Outperformers, these vendors have been evolving at an accelerated rate, introducing new features and refining their existing ones at a speed that exceeds the overall industry pace. The majority of these vendors also possess one or both of the emerging technologies, indicating their awareness of the direction the industry is headed and commitment to ensuring their platforms can help customers get there.

Overall, this is great news for customers because it means that there are strong offerings across the board. While each solution in this report has its own unique characteristics and areas in which it each excels, readers should be encouraged because numerous solutions would likely accommodate most of their requirements, and it’s just a matter of selecting what is ultimately the best fit.

Inside the GigaOm Radar

The GigaOm Radar weighs each vendor’s execution, roadmap, and ability to innovate to plot solutions along two axes, each set as opposing pairs. On the Y axis, Maturity recognizes solution stability, strength of ecosystem, and a conservative stance, while Innovation highlights technical innovation and a more aggressive approach. On the X axis, Feature Play connotes a narrow focus on niche or cutting-edge functionality, while Platform Play displays a broader platform focus and commitment to a comprehensive feature set.

The closer to center a solution sits, the better its execution and value, with top performers occupying the inner Leaders circle. The centermost circle is almost always empty, reserved for highly mature and consolidated markets that lack space for further innovation.

The GigaOm Radar offers a forward-looking assessment, plotting the current and projected position of each solution over a 12- to 18-month window. Arrows indicate travel based on strategy and pace of innovation, with vendors designated as Forward Movers, Fast Movers, or Outperformers based on their rate of progression.

Note that the Radar excludes vendor market share as a metric. The focus is on forward-looking analysis that emphasizes the value of innovation and differentiation over incumbent market position.

5. Vendor Insights

Alteryx

Alteryx’s on-premises and desktop Designer offering provides a no-code drag-and-drop interface for self-service construction of end-to-end data pipelines. At the time of this writing, Designer can connect to over 90 read-only and read/write data sources and destinations. Alteryx Designer includes over 300 functions to help transform data and tools to work with data, including built-in spatial analytics capabilities and the Alteryx Intelligence Suite, which provides ML, text mining, and computer vision capabilities. Designer also provides a suite of tools for predictive analytics, which uses R and covers data exploration, data preparation, predictive modeling, comparison and assessment of models, systematic grouping of records and fields, and more. Alteryx Designer also supports building analytic apps—self-contained programs designed with simple interfaces and intended to perform limited, specific functions for the user.

Alteryx Analytics Cloud Platform brings Alteryx’s cloud-based apps together into one centralized platform. These modules have their roots in technologies Alteryx has acquired over the years. One of these, Alteryx Designer Cloud, is based in part on the company’s acquisition of Trifacta, and it provides a data prep, data wrangling, and data transformation solution geared toward users of all technical skill levels. Alteryx Designer Cloud enables the self-service creation of workflows that connect to data sources and execute a series of transformations on the data in either an ELT or ETL fashion. Designer Cloud also includes predictive transformation capabilities, which are context-specific suggestions for transforming selected data that are automatically surfaced by the solution when a user selects any given data. Selections of transformations made by the user also inform and improve the solution’s ML prediction algorithms.

Alteryx’s acquisition of Trifacta provides a SaaS architecture that allows access to the Designer interface through any web browser. The integration of Trifacta’s cloud-native architecture into the Alteryx product suite allows it to provide customers with more flexible deployment options, including on-premises, cloud, and hybrid-cloud scenarios.

Strengths: Alteryx’s strengths include its ease of use, rich prebuilt tools, and transformations—such as predictive transformation capabilities, advanced analytics and spatial analytics, and the more flexible deployment options resulting from its acquisition of Trifacta.

Challenges: Alteryx acquired Trifacta in 2022 and has already integrated some of that vendor’s offerings into its stack; further integration and addition of new features and capabilities is ongoing.

Ascend.io

Ascend’s fully managed, automated platform provides a comprehensive offering to empower users to build and operate their data pipelines. Ascend provides data ingestion, transformation, delivery, orchestration, and observability, together in a single platform.

Ascend offers connectivity to a wide variety of data sources through its own library of prebuilt connectors, and it also integrates with data connectivity solution CData to leverage access to that vendor’s library of connectors for its customers as well. Users can also create custom Python connectors. Data updates are performed incrementally, and Ascend can automatically manage scheduling and prioritization of transformations and workflows.

Once a connection is made, data transformations can be applied using SQL, Spark SQL, PySpark, Scala, and Java. Ascend’s flex-code (flexible coding) framework allows users to choose between low-code and higher-code options depending on their level of technical expertise and the use case involved. An interactive query editor provides a suggestion-assisted experience for writing SQL queries; users can also make use of a “raw builder” to write queries directly. Ascend automatically infers schema when running a transformation, and it automatically detects and adjusts for schema drift. Transformations are executed in parallel, and a write connector is used to write the transformation results to the destination.

Ascend offers a number of distinctive features that highlight the automated nature of its solution and simplify pipeline creation and maintenance for its customers. Ascend’s platform provides a unified location from which customers can build and manage the data pipeline lifecycle from start to finish. All data feeds, connections, and transformations that a user can access are shared and monitored via a centralized dashboard. Ascend’s change detection capabilities, which it calls fingerprinting, and its robust data lineage, together allow its platform to respond in real time to any changes made at any point within the pipeline. Wherever a change is made, the platform updates items downstream of the point of change automatically, without needing the full pipeline to be rerun or users to manually trace and update each item that is impacted by the change.

What the vendor calls its “live ops view” of pipelines allows users to view the precise, real-time status of each object within the pipeline, such as connectors and transformations. Pipelines can be stopped, inspected, queried, and resumed from any point within the pipeline. Ascend allows users to apply predefined data quality assertions or write their own using SQL. Users can stop a pipeline at any point where data quality assertions are not met to prevent downstream items from being affected.

Additional items of note include a feature called Data Feeds that provides a collaborative approach to building pipelines, allowing users to share live access to pipelines to build different branches of the same project and then join them in a final dataflow, as well as share queries and resulting datasets with each other. A configurable notification system alerts users to important events. Users can connect Ascend directly to third-party tools and applications through a single API.

Ascend is a Saas solution that can be deployed either in the customer’s cloud account as a private and fully isolated (single-tenant) deployment managed remotely by Ascend, or as a SaaS offering hosted in an Ascend-owned cloud account with dedicated compute for the customer. Other security and access control features include role-based access controls, encryption at rest, and OAuth integrations for application authentication.

Strengths: Ascend’s strengths include its fully managed, unified platform, interactive query editor and flexible coding framework, robust data lineage and change detection capabilities, ability to pause and resume the pipeline and inspect data quality at any stage, and collaboration through its data feeds feature. It scored high on user-friendliness, connectivity, and security.

Challenges: Since Ascend is geared toward data engineers who are looking to create data pipelines via low-code SQL and Python, it does require a critical mass of such personnel with a basic level of coding experience within an organization.

AWS

AWS offers multiple services, designed for different use cases, that provide customers with ways to build and manage pipelines to move and transform their data. These include AWS Data Pipeline, AWS Glue, and AWS Glue DataBrew.

AWS Data Pipeline is a service customers can use to automate the movement and transformation of their data. A visual drag-and-drop pipeline creation interface and a library of prebuilt configurations helps streamline the pipeline creation process. Once the pipeline is created, it can schedule and run tasks through a feature called Task Runner. Supported destinations for storing data include Amazon DynamoDB, Amazon Relational Database Service, Amazon Redshift, and Amazon S3. Fault-tolerant execution is an important part of AWS Data Pipeline. If a failure occurs, AWS Data Pipeline automatically retries the activity and sends a notification to the user via Amazon Simple Notification Service (SNS) if it persists. Users can configure their notifications for successful runs, delays in planned activities, or failures.

AWS Glue is a serverless offering that consolidates multiple data integration services, including data discovery, data transformation, and a centralized data catalog, into one offering. As the vendor’s documentation explains, “AWS Glue runs your ETL jobs in an Apache Spark serverless environment. AWS runs these jobs on virtual resources that it provisions and manages in its own service account.” The AWS Glue data catalog can be accessed and leveraged by other AWS Services. The AWS Glue Studio provides a graphical interface from which users can create, run, and monitor their jobs and workflows. Frameworks supported by AWS Glue include ETL, ELT, and streaming. Connections to over 70 data sources are provided.

AWS Glue DataBrew is a no-code data preparation tool that provides a visual interface for data discovery, visualization, cleaning, and transformation. Users can choose from a library of point-and-click transformations to apply to their data in a series of actions and steps the vendor calls a “recipe.” Users can preview a portion of the data before and after the transformations, to help them decide whether to modify a recipe before applying it to the entire data set. DataBrew integrates with other AWS data services, such as Amazon RDS, S3, and AWS Glue Data Catalog, for the output of DataBrew recipes. Data lineage tracks the way data flows from its origin, through transformations, and into its destination. DataBrew integrates with other AWS services for data management including Amazon CloudFront, AWS CloudFormation, AWS CloudTrail, Amazon CloudWatch, and AWS Step Functions.

Both AWS Glue and AWS Data Pipeline are integrated into the larger AWS Stack, especially AWS Glue, as its data catalog provides cataloging to and can be accessed by multiple other AWS services. AWS Data Pipeline is not serverless, allowing its users more control over the resources used. AWS Glue and Glue DataBrew are both serverless, allowing a hands-off approach to resource management. AWS Glue and Glue DataBrew offer prebuilt transformations and AWS Data Pipeline allows users to create their own transformations. Data Pipeline also offers prebuilt configurations for setting up entire data pipelines.

Strengths: AWS strengths include multimodal data pipelines, low-maintenance operation provided by the serverless offerings, no-code visual interfaces for pipeline building, a large library of prebuilt transformations, and smart suggestions for transformations.

Challenges: Data pipelines on AWS are spread over multiple different but overlapping services, which can sometimes make for difficult architectural and design choices, and carries the risk of customers selecting and investing in a platform that might not be the preferred one for future investment by AWS.

dbt Labs

The creators of the data build tool (dbt) built it on the belief that data analysts can and should work like software engineers; in other words, analytics teams should be able to benefit from the same concepts and techniques that allow software engineering teams to collaborate and rapidly create quality applications. Therefore, it is an unabashedly cloud-first platform that does not follow the visual designer approach that many of its competitors embrace. This is quite intentional on the company’s part and will both attract and repel certain customers and users.

dbt provides an approach to data transformation aimed at helping data analysts automate the manual and repetitive tasks of their jobs. It promotes collaboration, offers version control to track changes, supports automated quality assurance to test data health and accuracy, encourages teams to include documentation within their code to explain their models and analyses, and emphasizes maintainability when writing code to make changes to data and schema.

dbt offers two solutions: an open-source option, known as dbt Core, and a fully managed SaaS offering, called dbt Cloud, that is a superset of dbt Core. dbt Cloud provides an integrated development environment (IDE), pipeline job scheduling, continuous integration, alerts, documentation, data freshness testing, metadata and job APIs, access controls and audit logging. It also features automated dependency management through a visual directed acyclic graph (DAG), which shows how data is transformed over time, and is automatically built as the user codes. The larger dbt community brings strong documentation and support, as well as an extensive library of common code snippets relevant to common organizational use cases. dbt Cloud is available in AWS or Azure or in the customer’s virtual private cloud.

SQL code can be reused through the creation of macros, which are comparable to functions in other programming languages. The “ref” statement, used in place of hard-coding table references in dbt, allows users to change deployment schema via configuration and is used to automatically build the DAG, enabling dbt to deploy models in the correct order. Connections to databases, data warehouses, data lakes, or query engines are enabled through adapters, which are Python modules that adapt dbt’s standard functionality to a particular database. Users can choose from a list of prebuilt adapters or custom-build their own.

Administrators can set license-based or role-based access controls. With the former, permissions can be granted to features in dbt Cloud based on the type of license the user is assigned, either a Developer license or a Read Only license. With the former, users can be assigned varying permissions to different projects within their organization, based on the groups to which the user is assigned.

Strengths: dbt’s strengths lie in its extensibility and ecosystem, which includes strong documentation, community, and support; automated dependency management; and customizability. It scored high on user-friendliness, security, and reusability.

Challenges: dbt, in and of itself, does not address data movement; dbt’s focus is on data transformation, and it is often integrated with or used by other pipeline tools to provide data transformation capabilities in those pipelines.

Fivetran

Fivetran provides a fully managed, automated data movement platform that offers connectivity to a wide array of data sources, extracts data from them, and loads this data into one or more destinations. In addition to straightforward extract and load, it provides multiple types of data replication stemming from its 2021 acquisition of HVR, including log-based change data capture (CDC) and what the vendor calls “teleport sync,” which makes use of snapshots for log-free database replication.

An ever-expanding list of connectors includes various types: SaaS applications, databases, streaming events, files, and functions. In addition to its standard connectors, which the vendor says are designed to meet the needs of the majority of its customers and connect to the most widely used and popular data sources, Fivetran has introduced a framework for building connectors to API-based SaaS applications by request. These connectors, called “Lite connectors,” differ from its standard connectors in that they are built on an accelerated basis, which the vendor says is a move designed to provide quicker time-to-value for users, often for more specific use cases. Fivetran also offers a “Function” connector, through which users can write a function in AWS Lambda, Azure Functions, or Google Cloud Functions to extract data from the source, in situations where no prebuilt connector exists.

Once data is extracted and loaded, Fivetran helps customers orchestrate transformations on their data in a number of ways. At the time of the writing of this report, Fivetran had recently launched into beta its new Quickstart data models, which are designed to complement Fivetran’s most popular SaaS connectors and automate the process of applying transformations for the customer to require zero code. Through the Fivetran REST API, Fivetran also provides the ability to connect Fivetran to existing transformation tools that a customer may already be using. Fivetran’s integration with dbt Core, an open-source transformation tool by dbt Labs, enables users to set up dbt projects in a Git repository to transform their data using SQL statements. Fivetran then connects to the Git repository and runs the dbt models in the destination. Fivetran data models are prebuilt and dbt Core-compatible, so they can transform data for a wide array of supported destinations. Users can schedule transformations to run either periodically or when new data is loaded into the destination. Transformations can be scheduled through the Fivetran dashboard, or directly in a dbt project.

Fivetran offers many capabilities that simplify maintenance and automate the data integration process for customers. The Fivetran user interface allows pipelines to be created and configured without needing to write any code. Log connectors push log data into the destination platform, to facilitate analysis there. This analysis gives users information about the pipeline and enables them to monitor platform activity and track usage. The Fivetran centralized dashboard lets users manage connectors, transformations, destinations, logs, integration with BI tools, alerts, notifications, and other users (depending on their role and assigned permissions within their organization). Upon receiving data from its connected source, Fivetran preliminarily normalizes, cleans, sorts, and de-duplicates data before loading it into the destination. Updates to data are loaded incrementally, and can be scheduled to occur at user-defined intervals. Fivetran provides automated schema drift handling, in which any changes to the source schema (adding/removing columns, changing data types, adding/removing objects) are automatically detected and replicated accordingly in the target.

Security and access control features include automated masking of sensitive data, secure shell (SSH) tunnels, granular role-based access controls, and detailed logging. Fivetran does not persist data after it is loaded; data is purged from Fivetran once it successfully reaches the destination. Additionally, Fivetran provides the option for customers to deploy in their own virtual private clouds or stay purely on-premises if needed. In addition to these deployment models, Fivetran is available on any of the three public clouds, or in a hybrid cloud-and-on-premises combination.

Strengths: Fivetran’s strengths include its fully managed offering, automatic schema drift handling, automatic normalization of data, hybrid and multicloud deployment options, a focus on security, platform extensibility and ecosystem, and connectivity, including its new approach to building its Lite connectors.

Challenges: Fivetran’s main focus is on data movement (the EL in ELT and CDC/data replication). It integrates with other providers such as dbt for transformations, and through the Fivetran Quickstart data models, even provides assets for that platform. Fivetran also says it is pursuing integrations with other transformation providers. Its perspective is that it performs extract and load, and after data is loaded, it can orchestrate transformations on the data in the destination. It is open to partnerships and does not want to limit customers or force them to use a particular transformation tool. For some customers, this will be an advantage; we simply note it here as an item for readers to consider when evaluating.

Google

Google Cloud Data Fusion is a cloud-native serverless data integration tool built on the CDAP open source project, that enables customers to build ELT and ETL data pipelines, either through a code-free web UI or by using command-line tools. Cloud Data Fusion’s graphical interface, Pipeline Studio, allows users to visually build, design, preview, and schedule data pipelines. Users can also monitor and view metrics related to the pipeline, including historical runs, execution times, and logs. Pipelines in Cloud Data Fusion are composed of plugins, which are customizable modules for pipeline functionality, including prebuilt connections to data sources and targets, templates for transformations, alerts, actions, and more. Users can also build their own custom plugins. The Cloud Data Fusion Hub is a centralized dashboard from which users can browse plugins, sample pipelines, and other integrations.

Cloud Data Fusion also allows users to create a library of custom-built connections and transformations that can be shared and reused throughout an organization. Add-ons, called Accelerators, allow users to add special features to their pipeline, including enabling replication for a specific pipeline, or allowing users to work with specific types of data. Users can also create reusable pipelines. Cloud Data Fusion provides the ability to replicate transactional and operational databases, such as SQL Server, Oracle, and MySQL directly into BigQuery using the replication feature.

Cloud Data Fusion provides lineage at the dataset level and field level. The former shows the relationship between datasets and pipelines in a selected time interval, and the latter shows the operations that were performed on a set of fields in the source dataset to produce a different set of fields in the target dataset. Audit logs are provided by Google Cloud Audit Logs and can show information about activity and data access. Users can also set pipeline alerts when they create a pipeline or set up log-based alerts through Google Cloud Monitoring.

Google Cloud Data Fusion enables role-based access controls through identity and access management (IAM). Access control is managed at the project level. Administrators can grant roles to users and grant permissions for access to resources based on the user’s role. Customers can also deploy Cloud Data Fusion in private instances and use VPC (virtual private cloud) Service Controls for additional security.

Open source CDAP can be run in multiple clouds as well as on-premises, and can, in turn, run pipelines authored on Data Fusion in these deployment scenarios, in addition to pure cloud deployment.

Strengths: Google Cloud Data Fusion’s strengths include reusability of pipelines and portions of pipelines, hybrid and multicloud deployment, and support for ELT, ETL, and CDC frameworks.

Challenges: At the current time, the Cloud Data Fusion replication capability allows replication only into BigQuery.

Hitachi Vantara Pentaho Data Integration

Hitachi Vantara’s Lumada DataOps combines the capabilities of Lumada and Pentaho into a single platform. From Pentaho Data Integration, the platform provides ETL capabilities that allow users to capture, process, and store data. The platform blends these capabilities from Pentaho Data Integration together with functionality from Lumada DataOps, including access to the Lumada Data Catalog for automated discovery and classification using ML and customizable business glossaries.

A drag-and-drop pipeline designer provides a workflow interface that helps simplify the creation of ETL jobs and transformations. Jobs and transformations can also be scheduled to run at predetermined times or repeatedly at set time intervals. Users can inspect and preview data at any step of the pipeline workflow, as well as generate visualizations from any step. If any step in the transformation is processing slowly, a graphic will display around it to highlight the bottleneck. Users can create “template transformations” to automate their repetitive tasks.

Users can choose from a library of default prebuilt connectors to a wide variety of data sources, including relational database management systems (RDBMSs), data warehouses, object storage, SaaS applications, and files. The platform also provides native connectivity to Amazon Redshift, MongoDB, and Snowflake. Users can also build database-specific plugins or explore the Pentaho Data Integration Marketplace to download and share additional plugins developed by Pentaho and other members of the community.

The platform encourages collaboration through Pentaho Repositories, which store and manage jobs and transformations, and provide full revision history: track changes, compare revisions, and revert to previous versions if necessary. Data lineage and impact analysis allows users to view the origins of and trace a data item across transformations and workflows, as well as view potential downstream effects of updates or changes. Permissions can be set based on user roles, and include the ability to manage data sources (view, create, edit, delete), execute, preview, debug, and schedule pipelines, create and manage pipeline jobs and transformations, and publish and store them to Pentaho Repositories.

Support for the ELT framework is provided with pushdown optimization features that make it possible to offload some transformation operations to the target database. Native execution steps enable remote execution in databases including Hadoop, RedShift, MongoDB, BigQuery, Azure SQL DB, and Snowflake.

Integration with the Lumada Data Catalog provides metadata management, data quality and remediation, a business glossary, and AI-driven data discovery and classification. Pentaho Data Integration can access metadata stored in the Data Catalog for use in transformations.

Strengths: The strengths of Hitachi Vantara’s platform include native data source connectivity, platform extensibility and ecosystem, a marketplace for additional plugins, features to encourage collaboration, and the extensive data management functionality of Lumada Data Catalog.

Challenges: Hitachi Vantara provides a strong offering for building ETL pipelines; however, support for other data pipeline frameworks such as ELT require SQL knowledge and creation of parameterized transformations.

IBM

IBM has a number of products that provide data movement and transformation solutions for customers. IBM DataStage is an AI-powered data integration tool with an underlying parallel processing engine that helps users design, develop, and execute both ETL and ELT workflows for moving and transforming their data. IBM DataStage is deployable on-premises or in the cloud as a highly scalable, containerized service on IBM Cloud Pak for Data.

IBM DataStage provides a drag-and-drop interface, dubbed a “graphical builder,” for building pipeline flows. Users can schedule, monitor, update, rerun, and view logs for these flows. Components of a workflow include data sources that read data, data stages that transform the data, data targets that write data, and links that connect the sources, stages, and targets. Users can choose from a library of preconfigured stages for transformations. In addition to standard ODBC/JDBC connectors, DataStage also provides native connectors that are dedicated and use a data source API to directly call client-side libraries to the individual database system.

Users can create subflows, which are groups of transformations and connections, and save them to make portions of pipeline flows reusable. Parameters, parameter sets, and environment variables can be used to specify information for jobs at runtime instead of hardcoding values. A QualityStage consists of a library of data quality-specific stages that are useful to help check on and manage data quality within a pipeline flow.

In December, 2022, IBM announced its IBM Data Replication for IBM Cloud Pak for Data, a managed service that is also integrated into the IBM Cloud Pak for Data platform and provides change data capture for on-premises and cloud sources with a simplified user interface.

IBM Cloud Pak for Integration is an AI-powered platform that provides users with no-code tools to move and transform data across cloud or on-premises sources. IBM Cloud Pak for Integration can be deployed to Red Hat OpenShift on public clouds or in private data centers. Users can choose from a library of prebuilt templates and connectors to build workflows. A mapping assist capability provides AI-powered suggested mappings of data to the target system. Workflows and portions of workflows can be reused as well. One difficulty potential customers may encounter is that IBM does not make it clear which products in its suite to use in which instances.

Strengths: The strengths of the IBM solutions include reusability, AI-powered transformations, native data source connectivity, a library of prebuilt transformations, and data quality and data management capabilities.

Challenges: Since the cloud version of IBM DataStage is one component of the IBM Cloud Pak for Data platform, it would best suit organizations already invested in IBM infrastructure.

Informatica

Data management and integration giant Informatica provides a number of solutions to help customers with their data integration needs. Here, we discuss Informatica’s Intelligent Data Management Cloud platform and Informatica Cloud Data Integration Elastic (CDI-E).

The Informatica Intelligent Data Management Cloud (IDMC) platform is a fully managed, integration-platform-as-a-service (iPaaS) that provides a comprehensive suite of data integration capabilities. It can be deployed as a fully managed service in the cloud, with a serverless option available as well. The platform provides connectivity to a wide range of data sources, including relational and non-relational DBMSs, cloud data warehouses, SaaS applications, files, NoSQL databases, IoT data sources, other CDC sources, and many others. It also offers a toolkit for building custom connectors. A graphical interface and wizard-based design tools enable simplified pipeline creation.

In addition to ETL, IDMC supports an ELT framework with pushdown optimization, which converts data mappings into SQL queries or dispatches the transformation logic to be executed on the target database, often a data warehouse. This ultimately allows the computing power of the database or data warehouse to be leveraged for faster processing. In addition to ETL and ELT, the platform provides change data capture, and CI/CD capabilities with Jenkins, Azure DevOps, GitHub, and more are also supported.

For data transformations, users can choose from an extensive library of predefined templates for both basic and complex data transformations, or use an included SDK or Custom Transformation API to develop their own custom transformations. The platform also includes intelligent recommendations for transformations, which are powered by Informatica’s AI engine, called “CLAIRE.” Intelligent Data Management Cloud has access to the metadata management capabilities of the Informatica Enterprise Data Catalog. Role-based access controls allow administrators to create custom roles and grant permissions to users based on roles assigned. Platform extensibility and connection to third-party applications is enabled through ODBC/JDBC drivers and REST APIs.

Informatica Cloud Data Integration Elastic (CDI-E) is a serverless data integration offering that leverages the capabilities of the IDMC platform and is deployed in containers on Kubernetes/Spark-based compute clusters. Its serverless nature is designed for hands-free administration and management. It features a drag-and-drop interface, a comprehensive set of connectors, a library of prebuilt data transformation templates, AI-powered transformation suggestions and files, schema drift handling, and governance through other elements of Informatica’s data management stack.

At the time of this writing, Informatica had just announced the general availability of a free (CDI-Free) and a pay-as-you-go (CDI-PayGo) data integration offering with a subset of its IDMC platform capabilities.

Strengths: Informatica’s platform and solutions provide a well-rounded, comprehensive set of capabilities for data integration. Connectivity, multimodal pipelines, extended data transformations and AI-powered transformations, security, and data management are some of its many strengths.

Challenges: As a major player and a data integration powerhouse, Informatica’s offerings are designed to handle enterprise-level workloads and are geared more toward these customers, so they may be out of range for smaller businesses.

Matillion

Matillion provides multiple solutions for data pipelines, including Matillion ETL and Matillion Data Loader, which are respectively, an ETL/ELT tool built specifically for major cloud database platforms and a SaaS-based data integration tool. Matillion also recently released its Matillion Data Productivity Cloud platform, which is designed to streamline data pipeline processes, with support for dbt and a self-service approach to data source connectors.

Matillion ETL natively integrates with Snowflake, Amazon Redshift, Delta Lake on Databricks, Google BigQuery, and Microsoft Azure Synapse Analytics. It possesses a browser-based UI with a drag-and-drop interface for building data pipelines, as well as built-in collaboration (multiple users can build or work on the same job simultaneously from different locations), version control, validation, and data preview. All jobs, once created, can be scheduled to run automatically. Matillion ETL supports connectivity to the most popular on-premises and SaaS data sources with over 100 prebuilt data connectors. Users can also build their own custom connectors and export them to Matillion ETL to be used in other instances. Both prebuilt and custom connectors can be used in Matillion ETL orchestration jobs. Matillion provides an enterprise mode option for users running on instances over a certain defined size. This option provides data lineage, concurrent connections, access controls, audit logs, and job layout documentation. It also provides high-availability clustering, which allows jobs to run on clusters that can survive node failures, automatically resubmitting jobs that fail (available on Snowflake, Redshift, Delta Lake, and BigQuery).

Matillion Data Loader is a SaaS platform for extracting data from SaaS-based applications, on-premises and cloud databases, or file storage, and loading it into cloud data platform destinations through one of two methods: batch pipelines and change data capture pipelines. Popular sources include Salesforce, Facebook, Google Analytics, PostgreSQL, Oracle, MySQL, Excel, and Google Sheets. Destinations include Amazon Redshift, Snowflake, Google BigQuery and Cloud Data Storage, Delta Lake on Databricks, Amazon S3, and Azure Blob Storage. Matillion Data Loader offers automatic schema drift handling and no-code pipeline authoring, as well as batch data loading and change data capture within the same platform. It integrates with Matillion ETL for data transformation. Users can select the frequency for batch pipeline execution and schedule pipeline runs.

Matillion ETL and Matillion Data Loader are part of the Matillion Data Productivity Cloud, the vendor’s newly announced platform designed to streamline the pipeline process for customers. Data Productivity Cloud offers both highly technical users and data novices features that suit their needs. Matillion’s recently announced dbt integration is one example of a feature geared toward technical users, while its low-code/no-code pipeline building capabilities make Matillion accessible to non-technical users. The platform also provides an easy, wizard-like interface for building custom connectors, enabling what the vendor calls “universal connectivity.”

Strengths: Matillion’s offerings are strong in overall ease-of-use, automated schema drift handling, collaborative pipeline building, support for multiple pipeline frameworks, and an easy interface for building data pipelines and custom connectors.

Challenges: Matillion’s solutions are currently only deployable in the cloud, which would not be the best fit for an organization that prefers to run data pipelines within their own data centers.

Microsoft

Microsoft provides multiple solutions for data integration and building data pipelines. Here, we discuss the details and capabilities of Microsoft Azure Data Factory, Microsoft Power Query, and Microsoft Power Query Dataflows.

Azure Data Factory is a fully managed, serverless data integration service that enables data movement and transformation. In addition to offering “code-free ETL as a service,” as Microsoft itself describes it, Azure Data Factory can support native change data capture for SQL Server, Azure SQL DB, and Azure SQL Managed Instance. Azure Data Factory can connect to over 100 on-premises and cloud data sources, including non-Microsoft sources, and allows users to move data from these sources into the destination platform for further analysis.

Pipelines can be authored using a visual pipeline creation interface or built using the SDK. Processing steps in a pipeline are represented by “activities.” Activities within Azure Data Factory pipelines can be either movement (for example, the copy activity moves data from one data store to another), data transformations, or control activities. Data transformation steps can be built visually through Azure Data Factory mapping data flows. Users can build a library of reusable data transformation routines and execute them from their Azure Data Factory pipelines. Azure Data Factory provides built-in support for pipeline monitoring through Azure Monitor, PowerShell, Azure Monitor logs, and health panels on the Azure portal.

Continuous integration/continuous deployment using Azure DevOps and GitHub is supported. An integration runtime enables pipeline execution in different environments (such as on-premises and cloud).

Microsoft Azure Synapse Analytics also offers its own pipeline capability that consists of embedded Azure Data Factory pipelines that users can set up and access within Synapse Studio.

Microsoft Power Query is an engine for powering data transformation and data preparation that’s available in multiple products and services, including as an add-on in Microsoft Excel, Microsoft Power BI, and other components in the Power Platform. Users access Microsoft Power Query and connect to data sources through a graphical interface. Users apply no-code transformations using the Power Query editor, which is a set of ribbons, menus, and buttons. They can preview data and select transformations, which are common across all supported data sources. The M language is the data transformation scripting language in which all Power Query transformations are implemented. The Advanced Editor can be used to access the query script and modify and fine-tune functions and transformations. Data engineers can even write M scripts from scratch.

Power Query Dataflows are the self-service, cloud-based version of the Power Query data preparation offering. Users can move and transform data in the same way as Power Query does, but instead of sending the output only to the specific application being used, such as Power BI or Excel, they can send it to other target systems, such as Dataverse or Azure Data Lake Storage. Dataflows can also be used as Power BI data sources. Power Query Dataflows can be embedded within Data Factory pipelines. Users can trigger dataflows to run on demand or on a predetermined schedule. Dataflows support over 80 data sources, and transform the data using the same engine as Power Query. Specific use cases the vendor says Dataflows supports include data migration from legacy systems and using dataflows as a replacement for other ETL tools to build dimensional models and/or data warehouses.

Strengths: The strengths of Microsoft’s data pipeline solutions include ease of use, reusability, and multiple modes of pipelines. Microsoft’s data pipeline solutions provide both no-code pipeline authoring capabilities and the ability to customize and fine-tune transformations with code. Microsoft Power Query supports self-service data integration and Azure Data Factory provides developers with refined tools for building data pipelines.

Challenges: Azure Data Factory would best suit organizations with existing Microsoft products and infrastructure because when it runs in the cloud, it is available exclusively on Azure (although it can connect to non-Microsoft data sources).

Qlik / Talend

Qlik’s portfolio has expanded significantly over the last several years, beyond its original focus on business intelligence offerings, both through development and through multiple acquisitions. As a result, it currently offers a number of products that are relevant to users seeking data pipeline and data integration solutions. In January, 2023, Qlik announced its plans to acquire data integration and data governance company Talend, which is owned by Thoma Bravo, the same private equity firm that owns Qlik. At the time of the writing of this report, that acquisition is targeted for completion in the summer of 2023. Here, we discuss the relevant solutions Qlik and Talend each offer that provide value to users in the realm of data pipelines.

The first of these, Qlik Cloud Data Integration, is one module of the vendor’s data integration and analytics cloud platform, Qlik Cloud. Qlik Cloud Data Integration allows users to build data pipelines to move data from one or more on-premises or cloud sources, either through bulk loading or CDC, depending on the data source; transform the data through push-down transformations; and load it into one or more supported destinations, including Microsoft Azure Synapse, Google BigQuery, Snowflake, Databricks, and the Qlik Cloud platform.

Users can transform data as it is being extracted (“onboarded”), or transform it after first landing or transferring the data. Users can also create reusable transformations in the form of “data tasks.” The datasets created by Qlik Cloud Data Integration can also be used in Qlik Cloud Analytics, another module of the Qlik Cloud platform, as well as in on-demand apps and third-party applications. Limitations of this solution include the fact that it does not support updates to the source schema; if the source schema changes, affected pipelines must be updated.

Qlik also offers other relevant solutions, Qlik Replicate and Qlik Compose, stemming from its acquisition of Attunity. Qlik Replicate, formerly Attunity Replicate, provides data replication and streaming data ingestion. Qlik Replicate consists of a web-based console and replication server for replication of data across both heterogeneous and homogeneous data sources. Qlik Replicate possesses an intuitive graphical interface for building replication tasks. Qlik Compose, formerly Attunity Compose, provides a graphical interface through which users can build and execute data pipelines that ingest data from multiple sources and move it to a storage system for further transformation and analytics. These solutions mainly focus on moving data and do not provide transformation capabilities.

Qlik also offers Qlik Application Automation, resulting from its acquisition of SaaS integration platform company Blendr.io. This platform allows users to build integrations and automation workflows between SaaS applications. Read and write connectors to a wide range of cloud applications are provided, and this product is mostly geared toward a more technical audience.

As mentioned, in January, 2023, Qlik announced plans to acquire data integration and data governance company Talend by this summer. It remains to be seen how Talend’s offerings will be integrated into the wider suite of Qlik products. Talend’s flagship offering is its Talend Data Fabric, a platform that provides data integration, data preparation, change data capture, a pipeline designer, and the capabilities of ETL vendor Stitch, itself acquired in 2018 by Talend. Talend provides a huge library of connectors (over 1000), includes a drag-and-drop interface for creating reusable pipelines, supports batch and streaming data as well as hybrid- and multicloud deployments, and provides debugging capabilities. Live preview capabilities allow users to troubleshoot their data. Users can choose from a library of built-in data transformation capabilities. The platform includes role-based access controls and data quality proofing.

Strengths: Qlik provides an abundance of well-rounded, comprehensive data integration options to users in this category. The strengths of Qlik’s portfolio include multimodal pipelines (ETL, CDC), reusability, a huge library of connectors, and hybrid and multicloud capabilities.

Challenges: Qlik has a number of pipeline products and technologies in its platform, with significant overlap between them, each with its own limitations. The acquisition of Talend, if and when completed, will only compound the situation. This complexity can leave customers with a burden of too many choices and some risk that the particular technology they select may not be one of Qlik’s favored ones going forward.

SAP Data Intelligence Cloud

SAP Data Intelligence Cloud is a data management platform that enables data integration, orchestration, governance, and streaming across different data sources. It also provides ML tools to aid data science workflows and projects. Altogether, SAP Data Intelligence Cloud functions as a single streamlined platform for data integration and ML, with tight integration with SAP HANA and other SAP enterprise platforms.

Through the SAP Data Intelligence Launchpad, users can access the multiple functionalities of the platform. In the connection manager, users can build data pipelines through a drag-and-drop visual interface. Pipelines can be built using a library of more than 250 predefined operators (for example, data ingestion, connections to data sources, integration, processing, and ML) and users can also build custom operators. Connectivity is available to numerous data sources such as SAP HANA, S/4HANA, and SAP Datasphere (formerly SAP Data Warehouse Cloud), as well as non-SAP sources. Data processed in the pipeline can be analyzed and visualized in SAP Analytics Cloud.

SAP Data Intelligence Cloud also offers robust support for metadata management and governance. Within the platform’s Metadata Explorer, users can access a data catalog, view and track data quality, access a business glossary, and handle monitoring and administration. From the Data Catalog, users can view data properties, column types, profiling trends, and other details. Users can also define and categorize data rules, create rulebooks to organize their rules, and build a rules dashboard to track data quality through scorecards. The business glossary provides a centralized, shared repository of terms and definitions to help make them understandable across the organization. A monitoring dashboard allows users to view metrics on connections and usage, and monitor the status of tasks. From the administration portal, users can browse connections, publish and enrich datasets with lineage and profiling information, manage hierarchies and tags, and schedule tasks.

SAP Data Intelligence Cloud also offers data science tools to build, tune, and manage ML models. Predefined Python and R scripts with basic data science steps are available, as is integration with JupyterLab.

Strengths: The SAP platform’s strong points include data source connectivity to both SAP and non-SAP data sources on-premises and in the cloud, ML capabilities, and extensive data management and data governance functionality.

Challenges: The SAP Data Intelligence Cloud platform also enables ML and data science workloads, in addition to providing data pipeline capabilities. This grouping can cause confusion for customers, as it is not clear whether the data pipeline capabilities can be available on their own.

6. Analyst’s Take

In this report, we’ve examined the current offerings in the data pipeline landscape, explored their different capabilities and the areas in which they excel, and classified these solutions against key criteria and evaluation metrics defined in the companion Key Criteria report. This report provides a decision-making framework to help assess the impact that specific data pipeline options can have on your organization and the value they can provide.

Data pipelines are a critical component of an enterprise’s data stack, and the current vendor landscape contains many different offerings to suit many different needs, skill sets, and use cases. A company can choose a product designed to bring users with little technical experience into the data pipeline process and thus free up its data professionals for more advanced analysis tasks, or it can opt for a solution aimed toward data engineers and data professionals and take advantage of more advanced configurations.

Prospective customers should keep the following in mind as they’re comparing offerings:

  • Flexibility of choice: No matter the specific approach taken by any particular vendor, the good news for the potential customer is that, industry-wide, the overall trend is toward providing flexibility of choice for the consumer. Examples of this flexibility include vendors branching out from purely on-premises deployment to offering cloud-based deployment as well, and some vendors doing the reverse— branching out from purely cloud-based deployment to offering on-premises options as well. Some vendors are moving toward purely no-code interfaces and some vendors are offering the choice of higher-code or lower-code options to suit customers’ needs. For the customer, this means that, within the landscape of options, there is likely to be an option that particularly suits their use cases and needs.
  • Automation: There is an overall movement within the data pipeline landscape toward adding automation to solution platforms. This includes automated change detection, schema drift handling, and lineage tracking, in addition to AI-powered transformations and smart suggestions. For the potential customer, this means that platforms are becoming more adaptable in real time (for example, to changes in the source schema, to changes made to the pipeline, and in the ability to learn from common transformations and actions and suggest future steps accordingly).
  • Multiple frameworks: From their beginnings in traditional ETL tools, data pipeline platforms have evolved and advanced to encompass a wide range of tools and processes for moving and transforming data. Many current data pipeline offerings include different frameworks such as ELT and CDC in addition to traditional ETL. Some data pipeline offerings have doubled down on ETL, however, offering strong, specialized, robust solutions for this method of processing data.
  • Democratization: Many platforms offer features designed to include non-technical users in the pipeline process. These features include drag-and-drop, visual interfaces for pipeline creation, suggestion-based transformations or prebuilt templates for transformations. Some solutions are completely no-code; users do not write custom transformations at all, and instead choose from libraries of prebuilt transformation templates. Others provide more flexible frameworks through which users can choose higher- or lower-code functionality depending on their level of expertise.
  • Consolidation: Some vendors are seeking to position their platforms as a one-stop-shop for managing all the stages of the pipeline lifecycle, helping users build end-to-end pipelines from within their platform.
  • Multiple deployment options: There is an overall trend toward providing more choice in deployment options to customers. Vendors who, in the past, provided only on-premises solutions are now looking to offer SaaS deployment options—and even favor such deployments going forward—for their products, either through acquisition or internal development. And vendors who have been cloud-only are now seeking to provide on-premises solutions as well, offering more flexibility in the options they present to customers.
  • Downstream extensions: We are seeing a rising trend among vendors to extend the pipeline further downstream by offering “reverse ETL” capabilities, which send data from the pipeline’s target system (often, a data warehouse) into third-party APIs, effectively extending the pipeline past the target system to downstream SaaS applications.

All the vendors profiled in this report provide robust solutions, and all have their differentiating characteristics and areas in which they excel. This Radar report, along with its companion Key Criteria report, is intended as a guide you can use to inform yourself about the solutions available so that you can ultimately select the offering that best matches your organization’s needs.

7. Methodology

For more information about our research process for Key Criteria and Radar reports, please visit our Methodology.

7. About Andrew Brust

Andrew Brust has held developer, CTO, analyst, research director, and market strategist positions at organizations ranging from the City of New York and Cap Gemini to GigaOm and Datameer. He has worked with small, medium, and Fortune 1000 clients in numerous industries and with software companies ranging from small ISVs to large clients like Microsoft. The understanding of technology and the way customers use it that resulted from this experience makes his market and product analyses relevant, credible, and empathetic.

Andrew has tracked the Big Data and Analytics industry since its inception, as GigaOm’s Research Director and as ZDNet’s original blogger for Big Data and Analytics. Andrew co-chairs Visual Studio Live!, one of the nation’s longest-running developer conferences, and currently covers data and analytics for The New Stack and VentureBeat. As a seasoned technical author and speaker in the database field, Andrew understands today’s market in the context of its extensive enterprise underpinnings.

8. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

9. Copyright

© Knowingly, Inc. 2023 "GigaOm Radar for Data Pipelines" is a trademark of Knowingly, Inc. For permission to reproduce this report, please contact sales@gigaom.com.