Blog Post

Amazon preps Data Pipeline service to automate big data workflows

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Amazon’s newly launched Data Pipeline will help Amazon Web Services customers get a better grip on handling their data scattered throughout the various AWS data repositories and third-party databases, Amazon (s amzn) CTO Werner Vogels said Thursday.

This tool will make it easy for AWS customers to create automated and scheduled workflows — from DynamoDB and S3 storage to Elastic MapReduce, wherever they’re needed. “It’s pre-integrated with AWS data sources and easily connected to third-party and on-premise data sources,” Vogels said.

The proliferation of data — machine logs, sensor data and plain old database data — is driving the need for automating the flow of that data from databases to storage to applications and back. “You have to put everything in logs which creates even more data … in AWS,” Vogels said.

Users build their workflows with a drag-and-drop interface and schedule them to run periodically. By making it easy to consolidate data in one place, customers will be better able to run big batch analytics on their logs and other information.

There was not a ton of details other than that, but from AWS track record, the service should be available soon. Stay tuned for updates.

4 Responses to “Amazon preps Data Pipeline service to automate big data workflows”

  1. This is a great step forward for those who store Big Data sets in the Cloud. It will reduce the administration required to do periodic batch runs of data sets, for sure.

    But is Amazon looking at ways to offer in-memory services that would reduce the need for batch schedules? That would be an interesting proposition.

    • Derrick Harris

      Actually, workflows don’t need to touch EMR. If you have in-memory DBs in EC2, or are using Redshift, for example, you can move data there too. Or they could go from EMR to an in-memory system. It’s really whatever flow you want, I’ve been told.