Here are two certainties about big data. One is that companies need good data scientists. The other is that identifying good data scientists ain’t easy. That’s why Kaggle, the data science competition platform, is launching Kaggle Connect to link proven data science performers with companies willing to pay for their expertise.
Everyone calls himself a data scientist now — and there’s a reason for that. The title “gets you 40 percent more money,” says Kaggle CEO Anthony Goldbloom. ”The problem is that it’s hard to know how good someone really is until six months down the road when you realize they haven’t done anything.”
His argument is that folks who have done well in Kaggle competitions over the past two years — insurance actuaries, mathematicians, students, chemists — have proven they have what it takes.
And Kaggle bona fides are becoming currency. This job posting for a New York Times data scientist lists participation in a Kaggle competition as a key criterion.
Connecting the right data scientists with the right problems
With Kaggle Connect, the company is making its two top tiers of competitors — it’s an invitation-only list — available to companies on an individual basis. “If Pfizer comes to us with a problem that is maybe not well specified enough and needs more iteration than a competition would allow, we can provide a data scientist that suits that problem,” Goldbloom said.
The customer pays a subscription cost of somewhere between $30,000 and $100,000 per month to gain access to appropriate data science resources. Kaggle gets a cut of that money and the data scientist gets the rest — although Kaggle is not breaking out the percentages.
In the interactive chart below, click on the map to bring up the name, picture and profile of the Kaggle Connect member.
What Kaggle brings to the table is a roster of people who have performed well in its competitions. What the companies provide is a juicy problem to solve and data to use in that quest. In some ways this is an extension of what Kaggle has already done with EMC’s Greenplum division, although that project required the use of Greenplum’s Chorus toolset.
The top two of eight total tiers of 80,000 contestants will initially serve as the invitation-only talent pool for Kaggle Connect. That’s about 1,500 Kagglers (if that’s a word). Kaggle began running data science competitions in early 2012 and started publishing its leaderboard of top big data problem solvers last September.
We’ll see how this all proves out, but if Kaggle success is really a predictor of big data chops writ large, expect to see a lot more Kaggle boasts on resumes going forward.
Feature photo courtesy of Shutterstock user Dirk Ercken.