Think you’re unique? Let Yahoo’s data trove be the judge

It’s no secret that Yahoo analyzes a lot user data, but today it’s giving the world a striking visual sample of how all that data is used. A new tool called the Yahoo! C.O.R.E. Data Visualization lets visitors work their way through demographic data to see which news stories are the most popular among different groups in real time.

C.O.R.E. stands for Content Optimization and Relevance Engine, and it’s the system Yahoo uses to personalize the homepages for Yahoo users. (Psst … it also personalizes advertising. You know, like Google is currently being taken to task for.) Yahoo is touting some interesting data points on C.O.R.E.:

  • Yahoo’s homepage clickthrough rate has increased 300 percent since implementing C.O.R.E.
  • Every hour, C.O.R.E. processes 1.2 terabytes of user data. According to Yahoo, that’s the equivalent of 644,245,094 printed pages.
  • Every day, C.O.R.E. personalizes 2.2 billion pieces of content.
  • As a result of all this, more than 300 articles a month on Yahoo’s homepage receive more than 1 million clicks.

But if you’re not into big data porn, perhaps the C.O.R.E. visualization tool will just let you see how you stack up against the rest of your demographic (that happens to sign into their Yahoo accounts). As a male between the ages of 25 and 34, I should be reading a lot more sports stories than I am. Females of the same age range, they’re most concerned right now with celebrities, weddings and beauty treatments. Only 16 percent of those concerned about Peyton Manning’s arm are women, but the exact same percentage of men are reading about Scarlett Johannson’s look-alike twin.

Women, it appears, also spend more time in each story. Looking at men’s top stories, very few readers spend more than 2 minutes in any given piece.

C.O.R.E. relies heavily on the Hadoop framework, which lets companies like Yahoo (which actually played a big role in Hadoop’s development) store and process the terabytes of unstructured data that users generate as they click their ways across the web. According to a Yahoo spokesperson, C.O.R.E. “leverages Hadoop to distill this information into a content personalization model, which is updated every 5 minutes based on the most recent data.” Yahoo researchers often tout that Hadoop is “behind every click at Yahoo.”

To learn more about Hadoop, check out this post explaining it in more detail. To learn more about how some companies are pushing the bounds of analytics on the web and elsewhere, check out this story. Or, just attend our Structure: Data conference in New York next month, where veterans from Yahoo, Google (s goog) and other data-centric companies will talk about where the big data movement is heading.