So, if it’s safe to assume that big data is real, and that you should be investing, where do you start and what should you expect as you go through the adoption process? Big data today, is what the web was in 1993. We knew the web was something and that it might get big, but few of us really understood what “big” meant. Today, I believe we aren’t even scratching the surface of the big data opportunity.
A good example of potential big data use models can be found here.
Current issues with adoption.
There are a number of issues that will affect your ability to successfully adopt and make best use of a big data solution, but the three I believe are most critical are:
- Useable enterprise tools — The tools that will allow any business to fully utilize big data aren’t ready.
- Lack of staff expertise — The availability of data scientists or folks with a similar background is limited at best.
- Data gravity — As Dave McCory pointed out in his post on data gravity, where data is created/sent is where it ends up being used. The applications and people need to come to the data as I explained.
How will these adoption issues affect big data as a business opportunity?
Useable enterprise tools — The current suite of products include Greenplum, Cloudera’s Hadoop and others, which are making headway in many large enterprises. However, these tools are still new and generally require large technical teams trying to solve issues for companies like eBay and Sears. A smaller company would be less likely to gain the appropriate return on investment, because of the high complexity of implementation combined with low overall volume.
Lack of staff expertise — This area is similar to enterprise tools. Even if you’ve got 10 people working on the refinement of the system, it’s likely going to boil down to that one wizard/expert who can work magic with your data. Putting a large number of people on the problem won’t guarantee success.
Data gravity — Considering the strong possibility that most organizations will struggle to fulfill the promise of their big data strategy with internal resources, we are likely to see a proliferation of services from various cloud providers. My concern here is that the use characteristics of Big-Data-as-a-Service aren’t being thoroughly examined.
The questions and big picture concerns.
I see big data quickly becoming a competitive advantage, which is the good news. However, I see significant parallels between the ability to pay for and adopt big data and the first decades of the mainframe. Only a few companies could afford mainframes, and those companies that could afford them were able to develop real advantage. With the introduction of the internet and cloud computing we have moved to a much more democratic model of IT availability, but big data has the potential to re-insert that gap between the haves and the have-nots.
When thinking about democratizing the use of data, the following questions come to mind. They can relate to your implementations, but also are worth thinking about in general. They are:
- Where will your data reside?
- How will you get your data to the service?
- Will tools be delivered across the wide area network (WAN) to be run locally against your in-house data?
- How will you collect and capture your own data?
- If you store your data with a service how often will you use it? Or will you likely be paying to keep it handy for rare uses? (I call this the problem of “Data in Waiting”)
- If you store your data on the service providers storage such as on S3 but you don’t want to pay for it when it’s not in use, will you delete it? How will you know it has been deleted?
- If your big data is running in a public cloud, what tools, and strategies will you use to make that data available to customers and other applications (integration)?
- Will big data cause you to buy more WAN capacity?
- Will big data cause you to rethink your enterprise application strategy?
So what’s the solution to bring data to as many businesses as possible?
To make big data available to everyone we need quite a few things to happen. We need to figure out simple use cases for data to solve common problem sets. Then we must make those available to developers so they can build tools that make solving those set problems easy. We need to continue to push the boundaries of cost-effective disk storage and network capacity, or provide ecosystem environments that allow for direct access over a private network. In an ideal world we will do both.
We’ll know big data has arrived when the use of the service is integrated into common business software tools that are used by the majority of your businesses employees. Also key will be the ability of any knowledge worker to run their own questions/queries against internal and external data sources. The average business won’t be able to call big data truly successful or accessible as long as its usability is being defined and managed by a small disconnected team of IT scientists.
Mark Thiele is executive VP of Data Center Tech at Switch, the operator of the SuperNAP data center in Las Vegas. Thiele blogs at SwitchScribe and at Data Center Pulse, where is also president and founder. He can be found on Twitter at @mthiele10.
Interested in big data? Come talk to us at our Structure Data event next month in New York City.