4 Comments

Summary:

Amazon CTO Werner Vogels unveiled the company’s new Data Pipeline Service at AWS: Reinvent conference in Las Vegas. The service aims to make it easier for AWS customers to automate data workflows between various AWS and third party repositories.

AWS: Reinvent
photo: Barb Darrow

Amazon’s newly launched Data Pipeline will help Amazon Web Services customers get a better grip on handling their data scattered throughout the various AWS data repositories and third-party databases, Amazon CTO Werner Vogels said Thursday.

This tool will make it easy for AWS customers to create automated and scheduled workflows — from DynamoDB and S3 storage to Elastic MapReduce, wherever they’re needed. “It’s pre-integrated with AWS data sources and easily connected to third-party and on-premise data sources,” Vogels said.

The proliferation of data — machine logs, sensor data and plain old database data — is driving the need for automating the flow of that data from databases to storage to applications and back. “You have to put everything in logs which creates even more data … in AWS,” Vogels said.

Users build their workflows with a drag-and-drop interface and schedule them to run periodically. By making it easy to consolidate data in one place, customers will be better able to run big batch analytics on their logs and other information.

There was not a ton of details other than that, but from AWS track record, the service should be available soon. Stay tuned for updates.

  1. This is a great step forward for those who store Big Data sets in the Cloud. It will reduce the administration required to do periodic batch runs of data sets, for sure.

    But is Amazon looking at ways to offer in-memory services that would reduce the need for batch schedules? That would be an interesting proposition.

    Share
    1. Actually, workflows don’t need to touch EMR. If you have in-memory DBs in EC2, or are using Redshift, for example, you can move data there too. Or they could go from EMR to an in-memory system. It’s really whatever flow you want, I’ve been told.

      Share
  2. Mark Chmarny ✔ Thursday, November 29, 2012

    As in-memory data grids become the backbone of next-generation on-line applications, their dependency on any specific data storage technologies will become less relevant. AWS’s Data Pipeline Service could cross the divide between local/cloud data and allow HDFS to become the consolidated data storage platform of choice.

    More on this here: http://mark.chmarny.com/2012/11/hdfs-has-won-now-de-facto-standard-for.html

    Share
  3. Whoa, now that’s something I can use – data pipeline, great idea.

    Share

Comments have been disabled for this post