Most companies hiring data scientists are in California, Washington and New York. But would it surprise you to find out that Massachusetts has almost as many data science openings as Texas? And that Illinois, Virginia and North Carolina all have more job openings for data scientists than New Jersey, which lives in the shadow of the New York City financial sector, one of the largest employers of data scientists?
These are among a handful of eye-opening findings from a new analysis of where and what type of companies are hiring for data science workers, conducted by the crowdsourced data enrichment platform CrowdFlower.
The data science meme has been trending for several years now. Nearly everyone wants to be a data scientist, talk to a data scientist, hire a data scientist or invest in a data science startup. But where are all those data scientists working? And what are they working on? A good place to get a handle on the data science sector is Data Science Central, one of the industry’s leading data science community sites and blogs. It’s the online watering hole for data scientists and its edited and run by Vincent Granville.
Granville himself may be the most connected person in data science. He has over 10,000 connections on [company]LinkedIn[/company] and has links into thousands of companies that have data scientists. Granville analyzed his network and found 6,000 companies with the highest concentration of data scientists in LinkedIn. The list, however, was a simple text list of all those companies. The sampling is clearly not scientific but it’s big enough to pull out some interesting trend data.
Here’s where CrowdFlower came in. CrowdFlower is a data enrichment platform that is used by data scientists to clean up messy and incomplete data using an online workforce of millions of people. The platform automates the management of the workforce to give users high-quality, structured data in return.
CrowdFlower ran Granville’s long list of companies through its workflow to identify company locations and the industry categories of each business on the list. Here are some of the results, as visualized in Silk.co’s data publishing platform and visualization engine.
Geography: California rules data science, but there’s a very long tail
As the tech capital of the planet, California dominates the U.S. hiring market for data scientists. Just above 28 percent of all data science job postings were put up by companies located in the Golden State, according to the CrowdFlower numbers.
New York is the next in line but far behind, with only 13 percent of the postings. This is a bit surprising considering the Wall Street juggernaut, but it’s important to remember that New York only has one technology center while California has multiple ones, including San Francisco, San Diego and Los Angeles.
Washington, powered by [company]Microsoft[/company] and [company]Amazon[/company], was a distant third and Texas fourth, driven by energy sector jobs. But the data science hiring market has a long tail with Illinois, Florida, North Carolina, Massachusetts, New Jersey, and Virginia all showing significant data science hiring markets.
Washington, D.C., for its part had a very small data scientist hiring pool. A number of southern, western and midwestern states barely made a blip on the survey. This could indicate a bi-coastal bias, to Granville’s network but in some cases it likely reflects the composition of industry in those states.
Sector: Mostly tech, but some surprises
Analytics, consulting, software and financial services top the sectors hiring the most aggressively in data sciences, according to the CrowdFlower findings. These four sectors comprise over 50 percent of all the data science jobs founds. Education and research, which includes universities and educational institutions, came in next at 9.3 percent, followed by Advertising, publishing and media at 7.5 percent. This is not surprising, given that click-stream data is hugely data intensive and online marketing is essentially an extension of data science.
Recruitment and construction logged in after that, both at 6.8 percent. Construction, which includes construction services, was interesting — but quite surprising was recruitment, which is not really known as a data-intensive industry. Some other surprises was the low percentage of life sciences hiring. Life sciences is quite data-intensive. We’re not entirely sure why it turned up as so low, but some possibilities may include that the job postings do not tend to live on LinkedIn or that the job titles are quite distinct (bioinformatics, for example) from the more popular data science titles.
CrowdFlower will continue to do research around data science and figure out new ways to pull in interesting data, so stay tuned for more data sets in the near future.
Alex Salkever is head of marketing at Silk, a data publishing company that simplifies data visualization and analysis. Follow him on Twitter @AlexSalkever. The graphics in this post are made on Silk.