Blog Post

Facebook is collecting your data — 500 terabytes a day

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

With more than 950 million users, Facebook(s fb) is collecting a lot of data. Every time you click a notification, visit a page, upload a photo, or check out a friend’s link, you’re generating data for the company to track. Multiply that by 950 million people, who spend on average more than 6.5 hours on the site every month, and you have a lot of information to deal with.

Here are some of the stats the company provided Wednesday to demonstrate just how big Facebook’s data really is:

  • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
  • 2.7 billion Likes per day
  • 300 million photos uploaded per day
  • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters
  • 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes
  • 70,000 queries executed on these databases per day
  • 500+terabytes of new data ingested into the databases every day

“If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data,” said Jay Parikh, VP of infrastructure at Facebook on Wednesday. “Everything is interesting to us.”

Parikh said the company is constantly trying to figure out how to better analyze and make sense of the data, including doing extensive A/B testing on all potential updates to the site, and making sure it responds in real time to user input.

“We’re growing fast, but everyone else is growing faster,” he said.

10 Responses to “Facebook is collecting your data — 500 terabytes a day”

  1. I wonder if that data includes a record of every page we visit on the web that happens to contain a Facebook “like” button? I’ve seen the calls a browser makes to Facebook (even if you don’t click the button) and it includes your Facebook cookie and the containing page’s URL is passed in the referrer header .
    Tracking our every move?

  2. Gary E. Zimmerman

    It may not strike us that our daily Facebook interactions are bits worth analyzing, but those 500 terabytes a day add up. Looking at this pile of data with the right intelligence, organization and forward thinking outlook, Facebook’s data collection could mean big benefits for multiple businesses. It will be interesting to see what Facebook is able to do with all of this data.

  3. UK Data Pro

    Alternative headline: “Will Facebook be the first company to collapse under the huge weight of its own data”

    Store everything is seen as a low risk strategy because you don’t know what value you might get from this data in future.

    However we’re now seeing emerging evidence of a fallacy; the myth of the value of old data.

    We might just see companies who maintain focus, storing only what they need, can function lean and mean and compete with the big data beasts at their own game.

  4. Shama Kazmi

    That is a tremendous amount of data. There are environmental implications from generating, storing and transferring all this data. I wonder if this is a concern which hasn’t really come been highlighted – but deserves close attention nonetheless. As cliche as it may sound to a generation defined by consumerism and social networking – we have only one environment, one space and once it is damaged forever, there isn’t much we will be able to do. I’m not a fanatical environmentalist, just concerned about the collection of huge amounts of data for purposes which (may or may not) be entirely useful & the impact of this on the environment.

    • bharathchandrab

      Environmental implications of all?? I so strongly believe that FB does heavy optimizations on data & storage and reduces redundancy which a million of other companies do bother to.