14 Comments

Summary:

[qi:gigaom_icon_cloud-computing] Collectively, Yahoo, Facebook, Amazon and Google are rewriting the handbook for big data. Startups intending to reach these proportions must also change their thinking about data, and enterprises need this model for internal deployments as a way to retain an economic edge.The four leading web […]

[qi:gigaom_icon_cloud-computing] Collectively, Yahoo, Facebook, Amazon and Google are rewriting the handbook for big data. Startups intending to reach these proportions must also change their thinking about data, and enterprises need this model for internal deployments as a way to retain an economic edge.The four leading web giants have designed systems from scratch, evidence that workloads have altered, business models are different, and economies have changed — all demanding a new approach.

Yahoo revealed a few weeks ago how it approaches unstructured data on an Internet scale with MObStor, the technology that “grew out of Yahoo Photos” but now serves the unstructured storage needs across the company. Earlier this year, Facebook unveiled Haystack, its solution to managing its growing photo collection (which could reach 100 billion photos in 2009 if it continues with current growth rates). In 2007, Amazon outlined Dynamo, an “incrementally scalable, highly available key-value storage system.” All of these were predated by The Google File System, presented as a research paper in October 2003.

While none of these systems are exactly alike, together they represent a complete change from traditional file systems and data stores. The Google GFS authors note that their design “reflects a marked departure from some earlier file system assumptions,” causing them to “re-examine traditional choices and explore radically different design points.” These are not the systems we once knew.

Since MObStor, based on when information was released, is the new kid on the block, let’s take a look at some of its standout characteristics:

  • It’s designed for petabyte-scale content that is site-generated, partner-generated, or user-generated
  • Handles tens of thousands of page views every second
  • Unstructured storage/objects are mostly images, videos, CSS, and JavaScript libraries
  • Reads dominate writes (most data is WORM: write-once read-many)
  • Only a low level of consistency is required
  • It is designed to scale quickly and efficiently

These capabilities ensure that Yahoo can maintain its ability to store and monetize content effectively, and they are a far cry from solutions developed just 5-10 years ago. The scale, load, file types, read/write pattern, and consistency requirements represent another world compared with conventional enterprise solutions.

Perhaps as part of a migration effort, Yahoo’s MObStor incorporates existing storage systems, like NAS filers. This makes sense for Yahoo, which over the years has been one of NetApp’s largest customers. Facebook has jettisoned any attachment to storage devices other than commodity servers with internal drives, at least in Yahoo’s description of Haystack and the Facebook engineering blog post. And Amazon and Google appear to have made this all-commodity move long ago.

The telling shift is the overwhelming focus on smart software on inexpensive servers. This is not how storage industry giants like EMC, IBM, HDS and NetApp were born. But if the advance of Internet computing continues, the Goliath web properties will provide a crystal ball to how we will more broadly handle unstructured data on an Internet scale. Startups reliant on big data for their business have little choice but to innovate as well, finding ways to accelerate time to market and maintain outstanding service. Enterprises handling big data will need to modify their approach, too, otherwise they leave the door open to competitors that will take advantage of these cloud infrastructure economics.

You’re subscribed! If you like, you can update your settings

  1. How Yahoo, Facebook, Amazon & Google Think About Big Data – How to profit from the next big wave! – Google Wave Money Online Saturday, August 15, 2009

    [...] Link to the original site [...]

  2. How Yahoo, Facebook, Amazon & Google Think About Big Data – Gigaom.com | Review Google Cash Sniper Saturday, August 15, 2009

    [...] Post By Google News Click Here For The Entire Article Review Google Cash [...]

  3. Hassan Ibraheem (hassanibraheem) ‘s status on Saturday, 15-Aug-09 18:02:09 UTC – Identi.ca Saturday, August 15, 2009
  4. draftMedia (draftmedia) ‘s status on Sunday, 16-Aug-09 00:01:14 UTC – Identi.ca Saturday, August 15, 2009
  5. Great article! This applies to our business at http://www.binfire.com perfectly. We already have passed a few tetra bytes of storage and our storage needs are growing rapidly. We have decided to look into Amazon EC2 and S3 for future expansion.

  6. Hugo Angelmar Sunday, August 16, 2009

    I was quite impressed when I took a look at Amazon’s Web Services (which implement’s Google’s Map Reduce for some of its operations) and Haystack. The companies above also do a great job sharing the constraints and requirements they address and how they go about solving them. Great brief on the topic Gary.

  7. BotchagalupeMarks for August 16th – 14:20 | IT Management and Cloud Blog Sunday, August 16, 2009

    [...] How Yahoo, Facebook, Amazon & Google Think About Big Data – Collectively, Yahoo, Facebook, Amazon and Google are rewriting the handbook for big data. Startups intending to reach these proportions must also change their thinking about data, and enterprises need this model for internal deployments as a way to retain an economic edge.The four leading web giants have designed systems from scratch, evidence that workloads have altered, business models are different, and economies have changed — all demanding a new approach. [...]

  8. Cloud Droplet #94 Botchagalupe Gets an iPhone is as Good as it Gets | IT Management and Cloud Blog Monday, August 17, 2009

    [...] How Yahoo, Facebook, Amazon & Google Think About Big Data [...]

  9. Inside IT Storage » What Yahoo, Google, Amazon and Facebook are doing with their data Monday, August 17, 2009

    [...] you think you’ve got a lot of data, check out GigaOm’s look at some of the largest data giants ever.  Facebook, for example, is expecting to store its 100 [...]

  10. How Yahoo, Facebook, Amazon & Google Think About Big Data « The Android Life Monday, August 17, 2009

    [...] the original:  How Yahoo, Facebook, Amazon & Google Think About Big Data var AdBrite_Title_Color = '0000FF'; var AdBrite_Text_Color = '000000'; var [...]

Comments have been disabled for this post