Structure 09: How to Scale Up With Distributed Data Storage

Structure-090625-0847-D71_4450The emergence of cloud computing has forced companies to re-think about where their data is stored, as well as which technologies are best to support a distributed data storage model. The ideal scenario is storage that’s cheap and quickly scalable — just keep on adding storage in the cloud as your data needs grow. Facebook’s Avinash Lakshman says that over the past year the social networking company has tripled its storage nodes. Cloudera’s Chief Scientist Jeff Hammerbacher says he’s seen systems as large as 4,000 nodes in the cloud.

The ability to scale web services to massive proportions is actually driving innovation around data storage systems, pointed out moderator Jason Hoffman, CTO and founder of Joyent. But there’s no single system that can solve all the distributed data problems in the modern Internet company, said Tasso Argyros, CTO and co-founder of Aster Data Systems. One thing Argyros is looking for is platforms that can process data to find insights that are non-relational in nature (that’s more difficult to do with such a massive amount of data). Geir Magnusson, a consulting architect for the Gilt Groupe, said while these distributed data storage technologies might seem to be only for massive web companies like Amazon or LinkedIn, smaller web services can also benefit from distributed storage.

Ultimately offering storage for such massively scalable systems ends up hitting the bottleneck of the network right now — that’s why companies are increasingly investing in software until we can get all that uber unlimited bandwidth. And the hardware is changing, too. Are solid-state drives the future? Facebook’s Lakshman and the Gilt Groupe’s Geir aren’t so sure: They’re just not that scalable and robust yet.

Video of the panel is here:

Photo by James Duncan Davidson.