60 Comments

Summary:

Sure, more data scientists would be great. But Scott Brave, of Baynote, says the better solution is to create analytics products that are so easy to use that you don’t even need a data scientist.

shutterstock_115491706
photo: Sergey Nivens/Shutterstock.com

Virtually any article today about big data inevitably turns to the notion that the country is suffering from a crucial shortage of data scientists. A much-talked-about 2011 McKinsey & Co. survey pointed out that many organizations lack both the skilled personnel needed to mine big data for insights and the structures and incentives required to use big data to make informed decisions and act on them.

What seems to be missing from all of these discussions, though, is a dialogue about how to steer around this bottleneck and make big data directly accessible to business leaders. We have done it before in the software industry, and we can do it again.

To accomplish this goal, it’s helpful to understand the data scientist’s role in big data. Currently, big data is a melting pot of distributed data architectures and tools like Hadoop, NoSQL, Hive and R. In this highly technical environment, data scientists serve as the gatekeepers and mediators between these systems and the people who run the business – the domain experts.

While difficult to generalize, there are three main roles served by the data scientist: data architecture, machine learning, and analytics. While these roles are important, the fact is that not every company actually needs a highly specialized data team of the sort you’d find at Google or Facebook. The solution then lies in creating fit-to-purpose products and solutions that abstract away as much of the technical complexity as possible, so that the power of big data can be put into the hands of business users.

By way of example, think back to the web content management revolution at the turn of the century. Websites were all the rage, but the domain experts were continually banging their heads against the wall – we had an IT bottleneck. Every new piece of content had to be scheduled and sometimes hard-coded by the IT elite. So how was it resolved? We generalized and abstracted the basic needs into web content management systems and made them easy for non-techies to use. As long as you didn’t need anything too crazy, the problem was solved easily, and the bottleneck averted.

Let’s dig a little deeper into the three main roles of today’s data scientist, using online commerce as a backdrop.

Data Architecture

The key to reducing complexity is to limit scope. Nearly every ecommerce business is interested in capturing user behavior – engagements, purchases, offline transactions and social data – and almost every one of them has a catalog and customer profiles.

Limiting scope to this basic functionality would allow us to create templates for the standard data inputs, making both data capture and connecting the pipes much simpler. We’d also need to find meaningful ways to package the different data architectures and tools, which currently include Hadoop, Hbase, Hive, Pig, Cassandra and Mahout. These packages should be fit for purpose. It comes down to the 80/20 rule: 80 percent of big data use cases (which is all most ecommerce businesses need), can be achieved with 20 percent of the effort and technology.

Machine Learning

Surely we need data scientists in machine learning, right? Well, if you have very customized needs, perhaps. But most of the standard challenges that require big data, like recommendation engines and personalization systems, can be abstracted out. For example, a large part of the job of a data scientist is crafting “features,” which are meaningful combinations of input data that make machine learning effective. As much as we’d like to think that all data scientists have to do is plug data into the machine and hit “go,” the reality is people need to help the machine by giving it useful ways of looking at the world.

On a per domain basis, however, feature creation could be templatized, too. Every commerce site has a notion of buy flow and user segmentation, for example. What if domain experts could directly encode their ideas and representations of their domains into the system, bypassing the data scientists as middleman and translator?

Analytics

It’s never easy to automatically surface the most valuable insights from data. There are ways to provide domain-specific lenses, however, that allow business experts to experiment – much like a data scientist. This seems to be the easiest problem to solve, as there are a variety of domain-specific analytics products already on the market.

But these products are still more constrained and less accessible to domain experts than they could be. There is definitely room for a friendlier interface. We also need to take into consideration how the machine learns from the results that analytics deliver. This is the critical feedback loop, and business experts want to provide modifications into that loop. This is another opportunity to provide a templatized interface.

As we learned in the CMS space, these solutions won’t solve every problem every time. But applying a technology solution to the broader set of data issues will relieve the data scientist bottleneck. Once domain experts are able to work directly with machine learning systems, we may enter a new age of big data where we learn from each other. Maybe then, big data will actually solve more problems than it creates.

Scott Brave is co-founder and CTO of Baynote, an e-tail and e-commerce advisory business. He is also an editor of the “International Journal of Human-Computer Studies” (Amsterdam: Elsevier) and co-author of “Wired for speech: How voice activates and advances the human-computer relationship” (Cambridge, MA: MIT Press).

Photo courtesy of Sergey Nivens/Shutterstock.com

  1. thx for composing but this is axiomatic !

    Share
  2. Palantir Technologies. The solution is already out there, just not applied in this market.

    Share
    1. Cory Palantir Technologies is impressive but please don’t shill or perhaps you’re just an enthusiastic forward deployed engineer ;)

      Share
      1. I shill you not ;). Take it as you will.

        Share
  3. exactly

    Share
  4. Interesting insights!

    In addition to developing data scientists, we need to make analyzing data and deriving insights a way of life. It starts with whoever is in charge asking a simple question:

    “Where is the data that supports what you propose and say?”

    If all leaders from board members, senior management, to line manager persistently ask the above simple questions all the time, we will very quickly build a culture of analytics and fact-based decision making, similar to what great scientist have done in the past and still do. : )

    Share
  5. Exactly Scott – while I’m a huge fan of Russel Jurney’s ‘agile data’ concept (http://www.slideshare.net/russell_jurney/hortonworks-roadshow), how many companies have the capacity/knowledge/finance to even consider hiring a team of Data Scientist to answer what are likely relatively ‘basic’ business intelligence questions?

    What’s much more likely to happen is for a company’s existing Business Analysts to be trained up on a set of tools, such as Platfora, Datameer, Karmasphere or Pentaho’s Instaview, or even better – continue using the tools they know (SAS, R) through transparent connectors to Hadoop (RHADOOP). Odds are they will be able to deliver insight that in 99% of the cases will be ‘good enough’.

    The 1% left likely has custom needs, domain-specific questions, requirements for specialized hardware/software and are likely to ‘roll their own’ regardless.

    Share
    1. True. But why not also package together some best practices on a per-domain basis?

      Share
  6. Dhiraj Kumar (MBA, PMP,MIS, TOGAF, ITIL) Sunday, December 23, 2012

    Agree !

    Share
  7. There are two ideas that have been around for years and I think they apply here. One is that we are drowning in data, ie: producing data, information and reports that no one wants or reads. We are actually using more copy and printer paper than ever before. The other is that if you torture the data long enough, it will confess. People often take data or information and study and twist it to mean what they want while conveniently leaving out parts of it to make their case.

    Share
    1. Comes to mind the David Ogilvy analogy about the drunen sailor who uses research (we now call it data) as a lamp post to lean against so as not to fall, rather than look ahead to move confidently forward.

      Share
  8. Obviously there are enough motivations and benefits to make big data easy to use. However, using the web content management tool as an analogy to big data is plausible: the quality of web content can be assessed by almost everyone; but the quality of big data product must be explained properly by a scientific mind. An untrained person is usually confused or misled by the metrics, and thus leads to costly wrong decisions.

    Well, don’t get me wrong. I am not saying that data scientists are the smartest and others are stupid. The world of big data is like the situation of “blind men and the elephant”. We are all blind, and all the way we learn is to read the numbers. If what you read is the tail, you may think the elephant is like a rope. That’s the cruel nature of data science and thus requires scientific minds.

    Share
  9. I do not think that data scientists are domain experts. Data Scientists and domain experts need to work together to build realistic models that are self-learning and constantly changing as and when business reality changes.

    Share
    1. I agree. But, one of the issues with big data is that, in a perverse way, it sometimes is bundled in ways that make it too linear and a lot of the insights that could have been derived simply aren’t.

      Share
    2. You do realize that the majority of machine learning techniques can be applied with no knowledge of the domain? In fact, if you analyze the output of Kaggle competitions you will see that domain knowledge is not required, as discussed here in October: http://gigaom.com/data/why-becoming-a-data-scientist-might-be-easier-than-you-think/

      Share
      1. No. Expressing your real-world situation as a machine learning problem requires both domain specific knowledge and an understanding of machine learning. Here’s a simple example: you’re making a dating website, and you want to decide what people to suggest as matches. What is an instance? Is each historical pairing an instance? Is each person an instance? How do you select your labels? Is there a way to view this as a classification problem?

        In practice, there is more to Machine Learning than taking a list of instances in a standardized format and applying black-box algorithms.

        Share
  10. Well presented.

    Share
  11. Exactly my thoughts.

    Doesn’t really bring anything new to the table.

    Share
  12. Nice. Thanks. Big data is still just data, though, not magic. ie Big Garbage In –> Big Garbage Out

    Share
  13. Disagree! Depending on the domain, the data, and the application the feature extraction process as well as the task at hand require expert analysis & cannot be done automatically. If you don’t know the learning algorithms or the domain well enough, you cannot extract effective features & modify the algorithms for your own needs.

    Just check the machine learning, data mining, information retrieval papers in the literature. All of those papers focus on improving the state-of-the-art techniques tailored for the specific task on a specific dataset.

    Share
    1. Keyser,

      I agree with some of what you are saying. Yes, if you take a cross-domain, cross-application perspective, then very little can be done to generalize effectively.

      However, when you limit scope to a domain (like ecommerce) and a set of applications (like customer segmentation, product recommendations, etc.), the data doesn’t necessarily look as different as you are suggesting. Sure, data is always somewhat different, and so you would need an expert to get the optimal benefit. But what I’m suggesting is that, with the right generalized approach, a domain expert (ecommerce expert in this example) could effectively work within a “friendly” system to derive significant benefit.

      Share
  14. .
    In my opinion, data scientists will be in demand for a long time.

    1) who will monitor the systems that make big data easy to use for the domain experts? How do you know that the models that are being generated are correct and that the product is working properly? At minimum, a data scientist will be needed to develop such products, but also to deploy and/or monitor these types of products. Who will check to see if the domain expert is using the system properly and interpretting the results correctly? Who will be there to answer questions when the domain expert has them?

    2) Products are fine for fast followers, but for the market leaders, 80% is not good enough. To use online commerce as an example as the author does… Amazon will not use Baynote for product recommendations. They need their own data scientists to build custom solutions that get 100% of the job done. For other companies that compete with Amazon that are smaller and trying to narrow the tech / science gap with Amazon, using something like baynote makes complete sense and I agree there is less need for data scientists in these smaller companies. But, if you are amazon, I don’t see any time where they will not employ data scientists.

    3) Yesterday’s innovations become products (product recommendations, personalization, etc). But, the time between when data-driven innovations happen (by humans) to when they become products will provide the opportunity and demand for human data scientists. And so, unless we think ecommerce is done innovating, then I think human data scientists will continue being in demand even in companies that prefer to purchase productized solutions since they don’t want to be left behind.

    4) Online commerce was used as an example space where data scientists can be replaced with better products that allow domain experts to use big data without data scientists. I think this breaks down when you look at other areas. Facebook, LinkedIn, Twitter… Google, Yahoo…. Zynga, Playdom…. etc, etc. All these companies hire for data scientists. If they were to replace their data scientists with a product, I don’t see what product that would be or what vendor would fill that gap. In search, there are 3 big engines, what vendor will make a system that replaces what Yahoo and Google have built themselves? For all these companies, data science is fundamental to their success and differentiation so I don’t see how they can use some product instead. If the products are going to be built internally by these companies and not purchased from a vendor, then these companies still need to hire data scientists to build it… probably a lot more if they want the system to be run by a domain expert with very little interaction with data scientists. That seems like a lot of work just so a domain expert doesn’t have to talk to a data scientist. It would be a lot easier to build a system for data scientists and have them work together with domain experts… which is what is happens right now.

    Share
  15. one can create products to drill down the data, but is the drilling really giving you the right direction to optimum result? An anlytical product can simplify the data for you but a data scientist will infer and produce what can go away unattended..

    Share
  16. If you look at Big Data as a science, then it appears that the science is undergoing a Kuhnian paradigm shift. If so, there will continue to be a need for data scientists until there is a concensus on what the problems are and how to go about solving those problems. This field is so new that we do not yet know what we can do and how we can use the possibilities that the technology offers. Until we do, there will be a continued need for people to cook bespoke analytical tools or modify products to do what the business wants.

    Share
  17. Here we go again. Business people whining, “it’s too hard! It should just be easy!” But didn’t we just give you your shiny iPad to play with?!? Didn’t we just give you your cloud?!? Stupid nerds had to show off their little hadoop and now the MBAs think distributed number crunching is the panacea for all business problems. But I guess it gets VCs salivating, they throw some money around, people are employed, it’s all good. Yay, Big Data!!!! But off-the-shelf, template-based big data analysis will be old hat in short order, because once it’s in everyone’s hands, it’s table stakes. The only “insights” to be derived will come once again from custom analysis. Your bottleneck never really goes away. The goal post has moved. It’s the march of business and technology. Congratulations, you’ve graduated to a bigger treadmill.

    Yong Sheng has probably made the best point here; just because you put the tools in their hands doesn’t mean they will know how to use them, no matter how pretty and easy-to-use the interface is. Business people are notoriously bad at numbers (see “bottom line” mantra).

    Bah, maybe the Christmas season is making cranky :-)

    Share
  18. Interesting thoughts

    Share
  19. Don’t fix the traffic problem, have fewer cars on the road. Seriously, data has gotten more complex. Would be ideal to simplify, not always easy, but definitely possible. Data storage is easier. Data delivery has gotten easier in form of reports, visualizations, dashboards. ETL is the elephant in the room, very challenging to connect the dots between disparate data sets, and then find insight. And gap between IT and Biz remains, connecting business rules to technology. Either way you slice it, data to information is a HOT topic!

    Share
  20. Great insights into how to improve Big Data and make the insights more accessible to more people

    Share
  21. This solution is available today. Isn’t this exactly what Derrick Harris from GigaOm wrote about in http://gigaom.com/data/a-startup-asks-what-if-you-didnt-have-to-analyze-data-at-all/? The goal has to be automating the solution to such an extent that Big Data analysis is as simple as using Google.

    Share
  22. Interesting article, but debatable. Big data requires special talent and breed to work with and analyze to conclude and make final business decisions. It won’t be in the very near future that we’ll see a tool as such that can take care of this ambiguity even though there could be solutions implemented to enhance the process and make it easier to understand.

    Share
  23. I don’t even have to read past the title to know this idea is flawed. It is the same idea that people had when object-oriented programming was supposed to make everything in software so easy that even end users could just build their own. That never happened, and it couldn’t because it was an instance of silver bullet mentality. And big data today is suffering from the same mentality in some circles. It is a mentality that mistakes a technology for an absolute solution. There have been many instances of this mistaken kind of thinking. The companies that don’t get it eventually meet their demise if they don’t change their point of view. And saying you don’t need more data scientists for data is like saying you don’t need more programmers for software, it just won’t work.

    Share
    1. Interesting counterpoint. I don’t think what I’m proposing is as extreme as that though. Trying to create an abstraction layer for all big data analysis and machine learning clearly isn’t feasible. But, like I responded to Keyser above, the trick is limiting scope. What gives me hope is that there are examples of where it’s worked already: like in the recommendations space.

      My analogy is closer to replacing programmers with applications. There have been plenty of needs in the past that required programmers and now only require technology (like ad servers and social networking features). As others in the comments have suggested though, perhaps it’s just a moving target: there will always be new needs for data science. Whether the bottleneck truly loosens up then will depend on how quickly we can keep ahead of the curve with useful product.

      Share
  24. Many people thought that programmers wouldn’t be needed if someone was able to abstract business generic needs and make a tool that generates code for us. Actually it hasn’t become true. With data scientists is likely to happen the same. We might think that with suitable tools we don’t need them. But big data and scalable architectures are a technical challenge and any customisation (always needed) has to be made by qualified staff. Data Scientists will always be necessary, as well as programmers are even in the most standard industries.

    Share
    1. marianudo nailed it

      Let’s write the “one program” that writes all the other programs. It’s a template. It’s a wizard… It’s snake oil. But now repackaged to make sense of unstructured effluence? Right. Are decision makers really this desperate? Enough to buy this nonsense?

      Share
  25. It seems to me that this boils down to a case of reconsidering and refactoring the problem scope. Any problem can be broken down into inputs, processes and outputs. The process generally take you from inputs to outputs. Big Data is being treated differently at present for reasons I can only guess at; it’s new, it’s exciting, it’s mis-understood?
    The point being made here is spot on in my opinion, we just need to break the problem down into achievable chunks. Re-evaluate what we are trying to achieve overall and see she big data fits into that in order to get the most out of it.
    Ultimately data is data. How we store it, use it, manipulate it is just processes. It is still data no matter how much of it there is or what label we give it.
    Thanks for a great article.

    Share
  26. Great article. What I particularly like about it is the thought that the “Data Scientist” is no panacea in the face of the scale of the need (the 80% of companies who can’t access/afford a Data Scientist).

    In my personal opinion, focusing so much on the Data Scientist could end up recreating much of the same type of analytic (process and people) bottlenecks that we already have today.

    Share
  27. Amen, somebody needed to write this. Even startups and SMBs have enough data that they’re unable to derive value out of; hiring a data scientist is out of the question – only better tools and technology can help.

    Share
  28. So, I guess we need more (or better) data scientists to make “easier to use” analytics products.

    Share
  29. Big data, little data – the key to good analysis, is knowing what question(s) one wants to answer and then identifying what combination of information sources need to be examined to get answers. Along the way, one needs to use appropriate data collection and analysis methods for answering the question(s). Collecting or getting access to large amounts of data without knowing what the question(s) are is not very useful. The most frustrating thing in statistical consulting is when someone comes to you with their data already collected (or in the case of big data, their data streams already procured), asking you to process it to get the answer they want; but on examination, the data is not the correct type, was not collected in the right way etc. to answer their question regardless of how it is processed. Dash boards and other easy to understand displays can be very misleading if there is no understanding of what is behind them. Templates are fine if you understand all of the assumptions behind them and what they do (besides create pretty pictures).

    Share
  30. Thank you, I wish scientist will brake the abstract circle domain on our privacy from mind and physical life. I would like to sale this products.

    Share
  31. Totally agree. Tools like this: http://www.big-data-science.com/#
    Is how we solve the problems. Work smarter.

    Share
  32. have a look at SAP’s new HANA applications: these are business apps for analysts who can use them w/o including any data scientist, in scenarios like segmentation, flex. anayltical queries…
    found them very interesting when presented at sapphire

    Share
  33. This type of post worries me. The data scientists are skilled and properly trained in first, identifying the correct variables and test and second, interpreting the findings properly. In addition to this, cleansing e data set when required (with large data almost always). The concern I have with making it ‘easier’ to run analyses by making the programmes easier (and btw, many are pretty user friendly already if you skill up…) is that, just like with students, the new novice users will start trying out a combination of many variable (data mining) just to find interesting results. This is not a good thing, as mentioned by a previous poster, you can make your data tell you almost anything of you squeeze it enough. Perhaps a better approach than creating a one size fits all, let’s all play expert approach is to actually train up your staff to allow them to correctly use what is already on the market and to correctly interpret the data. This wouldn’t be too difficult and I’m sure that there would be many biting at the bit to receive additional training…

    Share
  34. Scott is missing the fact that a large portion of a day in the life of the data scientists is exploratory data analysis (EDA), feature construction and feature validation. This is the art of data science. The modeling / machine learning is the fun stuff at the end of the cycle. Once features and models have been identified and validated, models can then be actioned in-line via real-time systems and widely opened to all kinds of business leaders/analysts etc; but data science is required up front to identify and validate the features.

    Share
    1. Michael,

      The question in my mind is whether features can be shared cross-instance within the same domain. For example, if certain features (or feature templates) work well for ecommerce site A, might they also be worth a shot for ecommerce site B? In other words, can we leverage the learnings (that data scientists figure out) from one data set to another similar data set within the same domain, or do we have to rediscover from scratch every time?

      Share
      1. Scott
        We try to leverage features cross-instance within the same domain; sometimes it works out, sometimes it doesnt. In some cases, even within the same instance the features evolve e.g. in fraud applications new features are required to keep up. Point is that manual EDA is needed for feature definition/validation. Once features are in place, a self-service, interactive, collaborative environment can be made available to all kinds of end users, including data scientists who may carry on with modeling, simulation etc. Congrats on getting such an active response to your article!
        best
        Michael

        Share
  35. Always good to have more data analysis tools for users! I see a problem though with the data validation and reliability estimation.

    It’s great to have QuickBooks, but for a company, you still need an accountant.

    Share
  36. Why stop at big data scientists? Why don’t we abstract away entire businesses, so that our CEO wanna be can just buy some turn key program that will run the whole shop for him?

    Share
  37. Anonymous Coward Saturday, December 29, 2012

    We don’t need more heavy lifters, just make heavy weights easier to lift! Right …

    And of course, the fact that the author works for a company interested in providing data mining tools is of no importance, in the context of this article. (Actually he’s one of the founders and CTO.)

    Yet another issue is that the author seems to not have a proper understanding of big data. Not only is big data big, but it is also highly non-uniform in its structure. It’s not like a very big relational database, it’s more like a huge Christmas tree on which all sorts of different decorations were hung, plus a huge amount of boxes, toys and whatnot placed beneath the tree, plus all cats from the neighborhood climbing around in the tree. This is why professionals are needed to mine it. A non-professional will derive anything he wants from it, be confident in his findings, and not even be able to think about why his findings may be wrong. Letting a non-professional mine big data without the aid of a professional is like letting a politician with Alzheimer’s use statistics – you can’t tell anything about the result, other than it’s most probably useless, if not plain wrong.

    Share
  38. I am not directly involved in Big Data, but I’ve been closely tied to technology in general for the past 30 years. This discussion reminds me of how Esther Dyson once described artificial intelligence; she said “that’s what we call it until we can do it.”

    I believe that Big Data is following the same trajectory as many other important technologies. 50 years ago, computers were giants that live in special rooms, and you needed the intervention of a data processing wizard to submit your stack of punch cards, and then tell you whether or not your program ran successfully. Now most of us carry more computing power in our shirt pockets and we don’t need to know how to write a single line of code.

    Sure, at present we still require wizards to wade through our Big Data tasks (at least for the most part), but we do have examples where the technology is maturing and becoming more directly accessible. Netflix does a pretty good job of guessing what movies I might like to watch. Google and YouTube are just two examples of pretty sophisticated site analytics that I can access just by clicking on some menus.

    Humans are tool builders and pattern-recognizing machines. That’s what we do best. And if the gems hidden in Big Data are valuable, we will build tools that make it easier and more efficient to find those gems, and that will be more effective at screening out the garbage results. We have always benefited from standing on the shoulders of the giants who went before us, and I expect that the development of Big Data will be no different.

    Alfred Poor
    http://www.alfredpoor.com

    Share
  39. We don’t need more people that can read – just more books with pictures. Scott, I can’t describe how disgusted and conflicted I was with such a flipent and destructive headline mixed with such an inightful analysis. On one hand I am relieved to see such an educated man in common publication on such an important topic and on the other I am dismayed that a 10 year could read and analyse this piece and be completely and utterly disappointed by the content. I wish you well on your next contribution. And I hope you take my headline alteration in good spirit. I think that; more than anything else, it shows that what is really required to improve the field; is not dumbing it down but appreciating its complexity and making it accessible In a responsible way. We are looking into an abyss of data and hoping that our best minds can show us meaning. Let us not belittle the next great feat of human endeavour by suggesting that http://mr.data.miner.org can offer insights for 3.95 a month. Please do continue to inspire business on the potential of data through your publications but we respectively ask; don’t sell short the great minds of our generation, that work on it with cheap headlines. Good luck to us all in the future. The next few years will be fun. I.

    Share
  40. Jefferson Braswell Monday, December 31, 2012

    Both sides of the coin are valid observations, in a (over-simplified) “tools versus skills” comparison. The purpose of having tools is of course to make the people using the tools more productive. At the same time, tools in the hands of people who do not understand them can produce counterproductive results. ( Imagine, if you will – for a brief moment – a power saw with a blade meant for wood being used by someone on a steel pipe ! )

    Similar discussions have taken place in the less rarified air of such simple things as the ‘wizards’ that Microsoft was fond of populating in order to make their programming tools easier to ‘use’. I have seen cases where business executives have argued that the arrival of such programming ‘wizards’ was tantamount to the Yellow Brick Road stretching out in front of the organization’s future and becoming a central assumption in the information technology strategy of the organization. Putting more control of ‘business logic’ in the hands of business users and decreasing the need to rely on support (and budget) from the technology and engineering side of the house was a strategy that, however useful in appropriate doses, often left an organization foundering in the mud when the tide went out if no one bothered to vet the claims of the wizard makers (and sellers).

    Tools of all kinds are useful, and required. But knowledge and skills pertaining to the tasks and challenges that a particular tool has been pressed into service to address will produce far better results in the end than when there is a conceptual disconnect between the user of the tool and the user’s understanding and knowledge of the subject to which the tool is applied.

    Share
  41. Eval: (Data Scientist == Gatekeeper) ? (sack him/her) : (sack the status qou)

    Share
  42. I thought data scientists are exactly who will make big data easier to use

    Share
  43. Having been part of the literally hundreds of Web Content Management Software providers in the early 2000′s I really like the analogy of this market to understanding data. There are different challenges to be sure, but companies like Windsor Circle http://windsorcircle.com/ in the ecommerce space and too many to name in the social media space have done a good job of making this transformation a reality.

    Share
  44. Absolutely agree Scott point,we don’t need more data scientists,SAAS base in cloud computing already relaize this,IOT develop new inteligent interface can manage and treat most operation,operators just need training and understand operation,then involve manage with interface. We already have many hand on case for use this manage water supply network including asset management. IT invest in water supply more focus on new business development and ROI,so invest is not tradationally information department,which is operation department,this is bige difference than before,use this for operation not for informaion department like report.
    With BID DATA variety managment,more external constituents included in system,environment,meteorological,public safety and consumer.

    Share
  45. Generally dashboards and visualizations have been the primary tool to help to communicate insights with data. But they are one-size-fits-all, and when it comes to data analysis there is a big gap between the “data experts” (those that can understand visualizations) and the “data novices” (those that can’t understand visualizations).

    Dashboards are a tool to help someone “do” analysis, but they are not good for communicating the results of the analysis. In many cases you can automate the Data Analyst job completely and let software “do” analysis and communicate the results at the same time.

    I recently wrote about how we need to move away from Dashboards as the primary tool for communicating analysis and move to automated analysis:
    http://blog.automatedinsights.com/post/39923823985/dashboards-arent-the-answer

    The ideal scenario is you provide the right tool for the job. When it comes to Data Analysts/Scientists, they may always want to navigate the data on their own. Dashboards are fine for them. But for the vast majority of users, providing insights in plain English is the better option. Now technology exists to do just that (see http://automatedinsights.com)

    Share
  46. Tools like http://www.spunk.com and http://logscape.com go some way to addressing this. They certainly put Big Data Analytics in to the hands of the masses

    Share
  47. Wibidata (www.wibidata.com) is trying to make the job of a data scientist easier (from a model/develop/deploy point of view). They have recently open sourced the lowest layer in their stack, an entity-centric database that sits on top of hbase (kiji.org)

    Share
  48. Scott,
    Great post! I couldn’t agree more. We have just launched a program to close the gap between the data scientist and the BI analyst, by putting analytics in the hands of the data analyst and BI analyst. We provide a number of analytic functions embedded in our database, easily called by SQL. It opens up the use of text analytics, social media analytics, pattern match, time series analysis, path analysis, fraud analytics, and more to ordinary analysts. It also comes with tools to do things like sessionize data or parse JSON files. This is the future of analytics, not just the data scientist.
    Cheers,
    John

    Here is the announcement: http://www.paraccel.com/news/press-releases.php?acc=022713

    Share

Comments have been disabled for this post