What it is: Data Pipeline Automation is the practice of automating the creation of virtual infrastructure that transports data between systems. This is different from the traditional approach, in which data pipelines are created with code that must be rewritten as the data landscape changes, or with cloud-based services that need continuous reconfiguring.
What it does: With Data Pipeline Automation, engineers can create a system of data transportation and transformation that dynamically adapts to changing circumstances. So, without needing to write new code or reconfigure services, administrators can alter the pipeline significantly, for example adding new data sources to a pipeline or changing the manner in which that data is transformed before entering a central warehouse.
Why it matters: At this point, managing diverse data sources presents an engineering challenge for many large firms. If your firm’s data pipeline isn’t operating smoothly, it can negatively affect everything from sales management, to M&A activity, to regulatory compliance. But with Data Pipeline Automation, firms can set up a robust pipeline that adapts as their data ecosystems and business requirements change. Additionally, automation makes it easier to set up data analysis and storage solutions that take advantage of multiple cloud environments.
What to do about it: If your firm depends on data pipelines that are subject to frequent updates, you can consider Data Pipeline Automation. Additional use cases are if you’re preparing for a migration to the cloud, or gearing up for any other circumstance which will require unusually demanding and complex data transportation.
- Removes the need for hand-coding as data pipelines change
- Can lead to easier regulatory compliance through data transparency
- Makes major data shifts easier, such as those involved in cloud migration or M&A activity
- Creates a more future-safe platform for data-driven businesses
Enterprises can no longer afford to be reckless about how data is transported, consumed and managed. Regulations like GDPR require companies to be able to account for how and where they store sensitive information, and in the multi-cloud era, this can be a difficult task. Data Pipeline Automation can track data across its journey, and limit unnecessary duplication by orchestrating data flow at a high level.
Currently, tools for Data Pipeline Automation are limited to cloud-based orchestration systems such as AWS Data Pipeline that require some engineering skills to implement and maintain. However, as discussed in a recent GigaOm webinar, AI has come to Data Pipeline Automation, for example with automated tools from a firm called Ascend. Ascend allows companies to create pipelines by simply declaring their desired shape.