10 Comments

Summary:

How much of your data is Facebook collecting every day? Some new stats from the company reveal just how large its user base is, and what big data means to a company with 950 million users.

Facebook Prineville data center: open compute tray thumbnail

With more than 950 million users, Facebook is collecting a lot of data. Every time you click a notification, visit a page, upload a photo, or check out a friend’s link, you’re generating data for the company to track. Multiply that by 950 million people, who spend on average more than 6.5 hours on the site every month, and you have a lot of information to deal with.

Here are some of the stats the company provided Wednesday to demonstrate just how big Facebook’s data really is:

  • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
  • 2.7 billion Likes per day
  • 300 million photos uploaded per day
  • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters
  • 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes
  • 70,000 queries executed on these databases per day
  • 500+terabytes of new data ingested into the databases every day

“If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data,” said Jay Parikh, VP of infrastructure at Facebook on Wednesday. “Everything is interesting to us.”

Parikh said the company is constantly trying to figure out how to better analyze and make sense of the data, including doing extensive A/B testing on all potential updates to the site, and making sure it responds in real time to user input.

“We’re growing fast, but everyone else is growing faster,” he said.

  1. That is a tremendous amount of data. There are environmental implications from generating, storing and transferring all this data. I wonder if this is a concern which hasn’t really come been highlighted – but deserves close attention nonetheless. As cliche as it may sound to a generation defined by consumerism and social networking – we have only one environment, one space and once it is damaged forever, there isn’t much we will be able to do. I’m not a fanatical environmentalist, just concerned about the collection of huge amounts of data for purposes which (may or may not) be entirely useful & the impact of this on the environment.

    Share
    1. bharathchandrab Sunday, August 26, 2012

      Environmental implications of all?? I so strongly believe that FB does heavy optimizations on data & storage and reduces redundancy which a million of other companies do bother to.

      Share
  2. Alternative headline: “Will Facebook be the first company to collapse under the huge weight of its own data”

    Store everything is seen as a low risk strategy because you don’t know what value you might get from this data in future.

    However we’re now seeing emerging evidence of a fallacy; the myth of the value of old data.

    We might just see companies who maintain focus, storing only what they need, can function lean and mean and compete with the big data beasts at their own game.

    Share
  3. can you at least spell terabyte correctly?

    Share
    1. This is fixed.

      Share
  4. Gary E. Zimmerman Friday, August 24, 2012

    It may not strike us that our daily Facebook interactions are bits worth analyzing, but those 500 terabytes a day add up. Looking at this pile of data with the right intelligence, organization and forward thinking outlook, Facebook’s data collection could mean big benefits for multiple businesses. It will be interesting to see what Facebook is able to do with all of this data.

    Share
  5. I wonder if that data includes a record of every page we visit on the web that happens to contain a Facebook “like” button? I’ve seen the calls a browser makes to Facebook (even if you don’t click the button) and it includes your Facebook cookie and the containing page’s URL is passed in the referrer header .
    Tracking our every move?

    Share
    1. I would be pretty sure about that. Even if it’s just the access logs.

      Share
      1. As the Facebook VP above says, “everything is of interest to us”

        Share
  6. I can’t imagine Facebook not selling off data with the huge number of corporations who could potentially profit from this and their current financial state.

    Share

Comments have been disabled for this post