7 Comments

Summary:

While some are hoping for better software to reduce the need for data scientists, WibiData’s Omer Trajman thinks we need more of them. Better software, he argues, is actually just a tool to make it easier for data scientists to do world-changing work.

Data scientists are changing the way decisions happen by making better use of big data. Rather than finding ways around them, we need to make data science more accessible as a profession and need to provide easier tools for data scientists.

Kevin Kelly, in “Better Than Human,” tells us how the future is going to go down. As we increasingly automate existing occupations, we create new jobs in order to instruct and direct those robots. We build robots to take over the instructional positions, and create new jobs that set parameters and develop feedback loops. We build new systems that are flexible and dynamic and create more new jobs — such as data scientists — to analyze and build models for these new systems. It is obvious that in such a world, where static models cannot keep up, data scientists will be indispensable.

Given this future, the argument presented in a recent GigaOM post, “We don’t need more data scientists – just make big data easier to use,” misses some key points. Its premise — that we need simple ways to deliver big data to business decision makers — is correct in that we need better tools, yet it misses an important distinction about who will use these tools. To complete author Scott Brave’s analogy of web content systems, data scientists are the designers and the content creators of today, not the software engineers or the IT bottleneck.

Every organization will need someone wearing the data scientist hat just like very organization has people responsible for product, sales, marketing and support. Unfortunately, to date, the tools available to data scientists have been rudimentary. Data scientists have had to learn diverse and complex computer languages for working with data. That world is changing as we create simpler ways for data scientists to use big data.

Even recommendations are more than just recommendations

Retail recommendations are a basic example of why we need data scientists today. A retailer sees a return on investment by showing any suggestions to its customers, whether or not the recommendations are relevant. That has been true ever since magazines and gum first flanked the checkout line. Data scientists can do better, but not by creating more sophisticated mechanisms for recommending the same gum and magazines. Recommendations are a very crude use of data.

As veteran retailers know, the art of cross-selling and upselling is not as simple as showing a customer what other customers bought or a list of related items. Data scientists are now building models that anticipate our needs. They are finding new ways to delight us and define experiences that create natural guides toward ideal outcomes. Soon, we will not need recommendations; with well-defined models, the subsequent decisions and actions will be obvious and then they will be automated.

Consider a retail experience that goes beyond recommendations. The elders among us recall once having delightful interactions with a local shopkeeper. The teller was the owner and knew every item in the store and every customer in town. The owner was able to personalize in his head, matching each customer to precisely what they needed. We lost this experience when we started to optimize for price and diversity over experience.

Using data analysis, we no longer need to choose between efficiencies and experiences. Instead, we expand the definition of efficiency to include customer experience. In order to stay competitive, a retailer can no longer focus just on the efficiency of moving products to the right locations. Competitive retailers are building a deep understanding of how customers use their products. Efficiency means matching the right product to the right customer at the right time, even as both evolve.

And that’s the easy part

Retail is a tangible example because we are all consumers, but retail is not the most instructive example of why data science is becoming such an important field. The fields of health care, education and energy are all evolving at a rapid pace:

  • We are discovering new drugs and new techniques, and finding new ways to measure how to best care for patients.
  • The ubiquity of laptops and tablets, the emergence of online courses and the thirst for ongoing education are creating new opportunities to measure (not test) how students learn and refine how we teach.
  • Energy is on the cusp of a radical shift — from oil to gas, from fossil fuels to renewable energy, from manual heating, cooling, driving and flying to automated, fine-tuned and efficient use of the energy we produce.

We cannot begin to conceive of static models that will track and analyze these advances. We cannot replace data scientists with vertical defined, highly segmented solutions delivering slightly better analytics to existing decisions makers. Instead, we need to develop new creative thinkers and give them high-level tools to help them apply detailed data-driven models across a range of challenges.

Like storytellers, data scientists embody the heart and soul of an organization and find ways to make it better. Every organization is going to employ someone whose responsibility is to use data to drive automated decision systems. With time, the decisions these data scientists make will become obvious and can be automated. Today’s decision makers get to spend time on more important jobs we haven’t even thought of yet.

We need data scientists, and we need hundreds of thousands of them.  They will do their magic, create new ways of experiencing life, products and services and, as Kevin Kelly says, “dream up new work that matters.”

Omer joined WibiData in 2012 having worked for three years at Cloudera, bringing Hadoop to the enterprise market. He was previously at Vertica (now HP) where he led the Field Engineering team. Omer holds a B.Sc. in Computer Engineering from Tufts University and was a visiting scholar at Oxford University reading in Computation and Engineering, focusing on architectures for large scale distributed systems.

Feature image courtesy of Shutterstock user Minerva Studio.

  1. While reading i was thinking that the customer doesn’t even need to visit the store, the PC will buy what is needed ,when is needed, at least until the 3D printers change how the world works.
    Then i realized that , sadly , the theory is way ahead of the world. Very few get how important navigation is and that has been true for quite some time . If you go to Amazon and want to look at all 23 inch full HD monitors, good luck with that (Amazon also has terrible product specs and descriptions so it’s so far from perfect that it’s incredible how well it is doing) The latest changes to Youtube are terrible for video discovery and Youtube was lacking in that department as it is.
    Maybe we need better decision makers first, since we don’t even utilize the knowledge we already have.

    Share
  2. Martin Bergstrom Sunday, January 6, 2013

    I think it is vital that we make data science more accessible. I work with large data sets and I’ve found that, because of the high level of technical training necessary to work with existing tools, many data scientists are caught in data driven models and existing modes of thought. If more people and personality types could operate large-data tools, we would get better research and conclusions. I use my data to provide important insights in behavior and decision making but try to take a step outside of pure numbers when making my conclusions, and I think more data-scientists must do the same in the future. I have my liberal arts education to thank and I think that as these tools become more accessible, there will be more people like myself working with these numbers while not being too numerically blinded.

    Share
  3. Isnt it a bit idealistic to suggest that every organization needs a data scientist? Doesn’t it ignore the reality that there aren’t enough data scientists to go around? Isn’t it more likely that your point of “every organization will employ someone wearing the data scientist hat” will result in companies seeking business-friendly tools that capture the 50-80% – and STILL likely have a major impact? I can appreciate that data scientists need better tools. But, due to the realities of shortage of labor and high costs combined with the high value output to companies – it is a sweet spot for software companies to empower the business leaders through solutions discussed in the article that you take issue with.

    But, I feel Shawn Dolley’s post says it best… “If you can afford a data scientist, great. If you can’t, you will need and want these data scientist in a box solutions. These firms give those business people the choice between doing some Big Data Science as opposed to doing nothing, which is what happens today.” Read it here:
    http://shawndolley.wordpress.com/2012/07/18/data-scientist-in-a-box-or-go-hungry/

    Share
  4. I love the example of an experienced local shopkeeper. The online analogy would be e-commerce stores having fantastic type forward search results as you use their search boxes. Let’s say you visit homedepot.com. Start typing “paint products” into their search box. See what happens. They have a major category for paint and related products, but they have zero results in type forward search results.

    There is a reason for this. Most search boxes are powered by dated technology. Google itself does type forward (what they call Instant Search). But they do it using query logs, and massive server farms. Meaning that even a giant e-commerce retailer like Home Depot can’t offer it.

    Great, personalized recommendations require good software tools to work.

    Share
  5. Agreed, yet, how will these data scientists be trained? In my own experience it took me ten years to unlearn the frequentist/econometric methods I studied in grad school and embrace classification algorithms, machine learning, clustering analysis, & Bayesian methods in general. I worry (a little) that the more you package data mining tools the more mis-use and lack of understanding you will get – even “model-free” analysis requires a mental model around what constitutes data and how it should be structured.

    A lot of “big data” conversations today remind me a little too much of the “rocket scientist” conversations on Wall Street in 1990. And we know how that (eventually) turned out (read Mandelbrot and Taleb if it’s slipped your mind…).

    Share
  6. I do agree with Omer but would be happy if web sites did not ask for my full address just to download a self serving white paper. Sadly I’m afraid, too many will add the latest technlology, which is truely great, without thinking about the user experience.

    Share
  7. I was thinking basically the same thing when i read the linked article. I have seen it over and over again when we “put the power in the hands of the people” and they end up misusing the tool or process and not truly understanding what they are doing and when they should do something else. Analysis is not repeatable process. It contains repeatable processes. But those processes can be changed or improved … almost immediately.

    Share

Comments have been disabled for this post