Data is playing a bigger role in our work, prompting the creation of the data scientist role (not as scary as it sounds) and requiring that organizations and individuals learn to work with Big Data to stay competitive. Think of all the text, images, video streams, and transaction logs added to the Internet and intranets every minute via social media, online shopping, and just work. A wealth of information is in this largely unstructured data and a lot of this information is good only for a specific time. The role of data scientist is emerging in organizations wanting to take advantage of this data flow.
In the broadest definition, a data scientist is someone who enables the exploration and discovery of what is this massive data flow is telling us.
I spoke with Anjul Bhambhri, IBM’s Vice President of Big Data Products and the 2009 recipient of the YWCA of Silicon Valley’s “Tribute to Women in Technology” Award. She has 23 years of experience in the database industry with engineering and management positions at IBM, Informix and Sybase and very broad view of what a data scientist can be:
What we are seeing here is that this [data flow] has created a role that needs a discipline — data scientists who explore what is happening outside the organization and gain insights to the business and pass it on to decision makers and other interested parties. We need to make this a part of our regular exploration.
In the old days, and for many organizations today, business analysts would ask a question and IT would provide the answer after figuring out how to structure the queries. Bhambhri says:
Now the IT group has to make sure that their data platform is all inclusive (not just internal databases and repositories); they must integrate data from all sorts of sources — but in this case they don’t know what the questions will be. IT has to provide data without knowing what the business folks are going to ask. And the business folks need the ability to explore, play around, ask ad hoc questions, and then see trends — maybe then they go back to IT with set questions for formal reports.
The new role of data scientist is helped by a background in statistics and math, but Bhambhri does not think it is mandatory. Advancements in available tools that expose the data and allow for visualization of the data have opened the process such that people can focus on their own business domain expertise as they formulate their questions. (See Ryan Kim’s coverage on some of these big data tools.)
Are you ready to be a data scientist? What does it take?
There is no widely accepted boundary for what’s inside and outside of data science’s scope. Is it just a faddish rebranding of statistics? I don’t think so, but I also don’t have a full definition. I believe that the recent abundance of data has sparked something new in the world, and when I look around I see people with shared characteristics who don’t fit into traditional categories. These people tend to work beyond the narrow specialties that dominate the corporate and institutional world, handling everything from finding the data, processing it at scale, visualizing it and writing it up as a story. They also seem to start by looking at what the data can tell them, and then picking interesting threads to follow, rather than the traditional scientist’s approach of choosing the problem first and then finding data to shed light on it. I don’t know what the eventual consensus will be on the limits of data science, but we’re starting to see some outlines emerge.
These are exciting times as advances in technology are opening up new roles for people in organizations. Are you one of the many who are asking for more and better data to do your job?