Blog Post

GigaOM Data Challenge: Predict which stories get read, win $10K

In publishing, analytics matter a lot. There’s a constant struggle to determine who will read what posts or articles, what the ideal headline might be and when publishing makes the most sense. That’s why GigaOM is teaming with Splunk (s splk) to help find that answer.

We’re hosting a competition on Kaggle’s data science platform to find the best models around likely readership across the WordPress (see disclosure) ecosystem of blogs. Here are the details:

The challenge is to predict whether a particular user will like a particular WordPress blog post.  The data consists of eight weeks of posts collected by WordPress, along with anonymized user responses to each post.  This challenge is an interesting mix of natural language processing (the raw blog posts) and metadata on the blogs and users. Contestants can download the data and submit prediction through the Kaggle platform, but a new feature for this competition is that they will also have free access to a Splunk server containing all the data, which they can employ for data exploration, visualization, feature extraction and modeling.

Aside from offering resources to work on the data, Splunk is also putting up $25,000 in prize money. The winning model will receive $10,000, second place $5,000, third place $3,000 and fourth place $2,000.

There’s also a $5,000 Splunk Innovation Prize for the most innovative use of data science, whether that comes in the form of a visualization, app, business model, you name it. Submissions for the latter track can be submitted through Kaggle’s new Prospect platform. Winners for both competitions will be announced at GigaOM Mobilize in September.

You can find out more about the competition here. Good luck!

Disclosure: Automattic, maker of, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, GigaOM. Om Malik, founder of GigaOM, is also a venture partner at True.

Image courtesy of Shutterstock user sukiyaki.

One Response to “GigaOM Data Challenge: Predict which stories get read, win $10K”

  1. Of course this is a fool’s errand. The prediction meme once again rears its head and leads a publisher to concentrate on the wrong thing. Actually, this is a silly pursuit.