Analytics is all the rage, with Hadoop and big data leading the hype. But the new technologies are not yet mature, a market full of startups is yet to shake out, and most enterprises have a mishmash of solutions from legacy warehouses to marketing department SaaS subscriptions and early developmental projects.
I talked this week with Andrew Brust, the Gigaom Research research director for big data and analytics, about the state of the market and any recommendations he might have for CIOs grappling with the optimal deployment of the technology. Among the management-oriented suggestions that Andrew offered are the following key points:
- Don’t overregulate all use of analytics. Andrew likens the current acquisition of SaaS data sources through the enterprise to the adoption of PCs in the 1980s. There was pent up demand for ready access to computing power that traditional IT wasn’t providing. Eventually PCs became so pervasive and so central to the business that centralized management was required and became the norm. Through those Wild West early years of PC use, however, companies not only gained immediate value and, sometimes, an early-adopter advantage, but through trial and error many of those renegade PC users also discovered and honed valuable new applications that were subsequently adopted in their organizations more broadly. Andrew sees the same dynamic at work today, and he expects departmental experimentation will likewise lead to valuable new applications of analytics and big, streaming data flows.
- Leave room for ad hoc data use. Just like the early days of end users exploiting VisiCalc, Lotus 1-2-3, and Microsoft Excel, many applications of new analytics are of a one-off or experimental nature. They are uniquely suited to the needs of an individual employee or a small workgroup, and are best developed by individual workers who aren’t encumbered with all sorts of heavy and unnecessary data governance requirements. This casual use of analytics is fundamental to a healthy organization. Already there are a number of new self-service tools that are making a new level of casual analysis viable, and as the technology matures more nontechnical users are about to gain access to an entirely new level of data and analysis. That will be a good thing.
- Recognize the threshold for when more data governance is required. Undoubtedly, there are data governance requirements for sensitive data and data that must be integrated for broader use within the organization. And, as one recent study points out, a CEO-led interest in the innovative use of analytics is correlated with the greater use of the capability throughout an enterprise. Andrew says there is a recognizable gradation and threshold as to when informal data use needs to be regulated: “you’ll know it when you see it”. IT organizations must be proactive in identifying and handling such situations, although Andrew is skeptical of such heavy-handed techniques as naming a Chief Data Officer to oversee all data use.
- The best metrics often bubble up democratically, rather than being imposed from above. A corollary to allowing casual data use through the organization is that individuals and small departments often know best how to do their jobs. In some ways they are thus the ones who best know how to measure their efforts as well. Although, as a recent Zendesk analysis in customer service confirmed, companies that measure performance get better performance, there is a risk in imposing too many metrics from above that may stifle and limit individual contributions. Andrew points to another historical trend—the rise and fall of the ‘balanced scorecard’—as an example of the impracticality of too heavy a hand in imposing top-down-derived metrics on too many aspects of a company’s operations. This is therefore an area where allowing bottom-up data experimentation can lead to better organizational practices. The best individual data findings are often adopted at the workgroup or departmental level, and some of the very best of those may percolate up for use corporate-wide. Although line-of-business managers may be the best at identifying these more broadly applicable uses of data, Andrew points out that IT departments may sometimes spot the same, based on patterns of data use that can be tracked within an analytics system.
- Know your organization’s appetite for experimental technology. Andrew notes that the Hadoop environment is rapidly maturing. However, we are still in the early days of the technology. Atop the promise of open source as a defense against vendor lock-in is the usual trade-off of proprietary, or at least vendor-dependent, enhancements that provide greater functionality than the current open-only standard. The immediate payoff from being pulled in the direction of those vendor enhancements may justify a risk that the solution does not survive in the longer term. However, Andrew offers a couple of suggestions for IT departments wary of going down that path. Apache Hive has a widely used SQL-like language that works on Hadoop. It provides only traditional batch query, but may be a match for the ready skillset in some organizations. Apache Spark is another open source enhancement to Hadoop that provides in-memory analysis that is appropriate for some applications (e.g., market analytics), but not all. Spark is being widely adopted by leading Hadoop vendors (e.g., Cloudera, Hortonworks), and so offers a degree of safety in a fragmented market. Finally, enterprises with large data warehouse operations that may be hesitant to inch too far out on the early-adopter limb can probably turn to their data warehouse vendors for Hadoop tools, rather than opting for more advanced capabilities from less stable, startup suppliers.