When we’re talking about conventional IT systems, we rarely question the idea of geo-distributed systems and redundancy. And we don’t usually challenge the notion that load balancing among servers and farms is a smart thing to do. So why don’t we routinely think this way about Hadoop?
Customers can set up multiple Hadoop clusters and use each one for a different workload. Companies can then site these clusters in different geographies, for redundancy, load balancing and/or content distribution. The data can be segregated or, using replication technology, it can be synchronized between sites to create a “logical data lake.” Is utilizing multiple Hadoop clusters in this way is folly, or is it just pragmatism?
In this webinar, our panel will discuss:
- Does Apache YARN make all tasks equal or does dedicating clusters to specific workloads make more sense?
- Is the data lake concept best for all, or is partitioning data between clusters right for some customers?
- Can Hadoop inter-cluster replication of data work?
- How do public and private cloud architectures impact the multi-cluster question?
- Can multiple clusters be a vector of parallelism and elasticity?
Speakers include:
- David S. Linthicum, SVP, Cloud Technology Partners
- Paul Miller, Founder, The Cloud of Data
- Lynn Langit, Founder & Consultant, Lynn Lancet
- Randy DeFauw, Senior Product Manager, WANdisco
Register here to join Gigaom Research and our sponsor WANdisco for “Apache Hadoop: Is one cluster enough?” a free analyst webinar on Wednesday, October 15, 2014 at 10 a.m. PT.

Comments have been disabled for this post