In the new world of “data lakes,” where raw data is collected for subsequent discovery and analysis, lies the task of managed data ingestion. While data lakes may dispense with the ultra-formality of an Enterprise Data Warehouse (EDW), data quality is nonetheless crucial. Users may like informal access to data, but they don’t want data that’s dirty or inaccurate. If the data lake is polluted, then it will also be stagnant, from disuse.
The ability to organize and prepare data, on the fly, as it is ingested into Hadoop Distributed File System (HDFS) storage, is therefore of utmost importance. Things like tracking operational metadata, business metadata, data lineage, and dataset quality are important in a data lake world, as they increase confidence in the Hadoop platform overall. This isn’t just Enterprise Information Management (EIM) for Hadoop; it’s agile rigor for data lakes. And it may be make-or-break for data lake adoption.
Join Gigaom Research and our sponsor Zaloni for “Data Management for Production Hadoop Data Lakes” a free analyst webinar on Tuesday, February 24, 2015 at 10 a.m. PT.
What Will Be Discussed:
Why is the data lake architecture growing in popularity?
What problems do data lakes solve?
Can we have schema-on-read and cleansing-on-ingest?
Is managed data ingestion compatible with real-time performance?
Can managed data ingestion be applied to the EDW, or just the data lake?
Who Should Attend:
Hadoop developers, Hadoop administrators