2 Comments

Summary:

Putting the right brains on the right problems is the goal of Kaggle Connect.

Here are two certainties about big data. One is that companies need good data scientists. The other is that identifying good data scientists ain’t easy. That’s why Kaggle, the data science competition platform, is launching Kaggle Connect to link proven data science performers with companies willing to pay for their expertise.

Everyone calls himself a data scientist now — and there’s a reason for that. The title “gets you 40 percent more money,” says Kaggle CEO Anthony Goldbloom. “The problem is that it’s hard to know how good someone really is until six months down the road when you realize they haven’t done anything.”

His argument is that folks who have done well in Kaggle competitions over the past two years — insurance actuaries, mathematicians, students, chemists — have proven they have what it takes.

And Kaggle bona fides are becoming currency. This job posting for a New York Times data scientist lists participation in a Kaggle competition as a key criterion.

Connecting the right data scientists with the right problems

With Kaggle Connect, the company is making its two top tiers of competitors — it’s an invitation-only list — available to companies on an individual basis. “If Pfizer comes to us with a problem that is maybe not well specified enough and needs more iteration than a competition would allow, we can provide a data scientist that suits that problem,” Goldbloom said.

kaggleranks2

The customer pays a subscription cost of somewhere between $30,000 and $100,000 per month to gain access to appropriate data science resources. Kaggle gets a cut of that money and the data scientist gets the rest — although Kaggle is not breaking out the percentages.

In the interactive chart below, click on the map to bring up the name, picture and profile of the Kaggle Connect member.

What Kaggle brings to the table is a roster of people who have performed well in its competitions. What the companies provide is a juicy problem to solve and data to use in that quest. In some ways this is an extension of what Kaggle has already done with EMC’s Greenplum division, although that project required the use of Greenplum’s Chorus toolset.
kaggleuserspecialty
The top two of eight total tiers of 80,000 contestants will initially serve as the invitation-only talent pool for Kaggle Connect. That’s about 1,500 Kagglers (if that’s a word). Kaggle began running data science competitions in early 2012 and started publishing its leaderboard of top big data problem solvers last September.

We’ll see how this all proves out, but if Kaggle success is really a predictor of big data chops writ large, expect to see a lot more Kaggle boasts on resumes going forward.

Feature photo courtesy of Shutterstock user Dirk Ercken.

  1. Having participated in a number of Kaggle competitions, I can tell you there’s a lot more to being an outstanding data scientist than topping the leader board on a comp with a pre-digested dataset. In fact, I know a number of top tier data scientists who have never participated in Kaggle because they’re way too busy being outstanding data scientists in their day jobs. So it’s kind of short-sighted to restrict a pool of candidates in this manner.

    Share
    1. Even so there is a good chance of becoming another way to separate candidates.

      HR has a huge problem with the process of evaluation of highly specialized positions. And time management is another issue related to performance of those individuals.

      Share

Comments have been disabled for this post