25 Comments

Summary:

The big data revolution is in its early innings, and it is about to get even more exciting. Ravi Mhatre, managing director at Lightspeed Venture Partners, discusses the disruptive innovations that are helping make data fast, intuitive and easy to analyze.

data_altemark

In the past, year big data has elevated from a hot topic in the enterprise to one of the most buzzed about, and potentially overhyped, phrases of the year. Big data has huge disruptive potential and the flood of attention should be no surprise. A recent IDC report stated that the business analytics software market grew by 14.1 percent in 2011 and will continue to grow to reach $50.7 billion in 2016, all driven by the focus on big data.

As one of the managing directors at Lightspeed Venture Partners, I spend a lot of time talking to companies about how they are using technology, such as big data, and meeting with entrepreneurs who are developing the next big disruptions in technology. I believe that to harness the power of this data revolution and gain a competitive edge, companies need to be able to do more than create and query their big data stores. They need to focus on making this big data fast, intuitive and easy to manipulate in new and interesting ways.

So what does that mean?

Companies need to do more than just store the data. Recent innovations in scale-out storage technology make it relatively inexpensive for companies to capture massive amounts of data. In many ways, that is how the big data conversation started. Companies such as Cloudera, Mapr, Vertica (acquired by HP) and Datastax are doing a great job of delivering the infrastructure required to hold and manage big data in a typical enterprise. (Full disclosure: Mapr and Datastax are both Lightspeed Venture Partners portfolio companies.)

Holding the data is step one of the process. The next challenge is how to use that data to help your business make better decisions. Right now, most companies are relying on data scientists to mine these raw stores of information. That’s a start, and we’ve seen some leading-edge companies make significant early revenue gains and cost savings with the help of data scientists. But, they are expensive, extremely scarce, non-real-time, and they don’t scale.

A new generation of startups is looking to democratize data science by building on top of the basic big data platforms to turbo-charge the speed, intuitiveness and collaborative methods by which businesses can extract value from the new flood of information.

The first challenge is to make this data fast. Today, it can take minutes or hours to get a response or glean a new insight buried in the typical enterprise’s mountain of data. As a result all questions must be carefully scripted and planned in advance. This limits the flexibility and agility of business questions that can be posed.

But in a world where big data can perform instantaneously or “at the speed of thought,” the results are dramatically different. When a user can maintain an unbroken train-of-thought, a fluid interplay starts to occur between asking an initial question, getting a response, refining and asking additional questions, and ultimately getting to a new, unanticipated “Eureka!” moment. Think Google Instant for the enterprise. There are a number of startups that are attacking this problem, including Qubole, Boundary, DataDog and several other stealth companies. (Full disclosure: Qubole and Boundary are both Lightspeed Venture Partners portfolio companies.)

The second layer of disruptive innovation relates to delivering dramatically improved experiences for navigating and manipulating data that has become so large that traditional spreadsheets, reports and charts would need millions of rows and pages to represent it (in simple terms, making data intuitive and easy to analyze).

New companies are focusing on a combination of AI (artificial intelligence), visualization, faceted search and social collaboration tools to empower hundreds or even thousands of ordinary business users to collectively mine, share and evaluate big data sets and gain insight without the need for a data scientist in the middle.

The emergence of self-service BI is allowing ordinary business users to drive the data warehouse for the first time and thereby eliminate expensive IT and data-scientist intermediaries. Historically, these intermediaries have been necessary to process requests and program reports, which ultimately constrained the business analysis process.  Some of the most exciting companies innovating in this space include Tableau, Cliktech and Edgespring. (Full disclosure: Edgespring is a Lightspeed Venture Partners portfolio company.)

The big data revolution is in its early innings, and it is about to get even more exciting. So while the term may be overhyped, the massive potential for companies to take advantage of these new innovations makes it worth all of the extra attention.

Ravi Mhatre is a managing director of Lightspeed Venture Partners (@lightspeedvp), where he focuses on investments in enterprise IT, mobility, and Internet and cloud-based services and applications. You can follow him at on Twitter at @RMTacct.

We will be discussing the challenges of big data and scalable analytics at GigaOM’s Structure: Europe conference in Amsterdam, October 16 and 17.

Image courtesy of Flickr user altemark.

  1. Dear Ravi Mhatre,

    I believe you meant Qliktech and not Cliktech. Sorry I am have OCD to details.

    Kind Regards,
    Nayeem Syed.

    Share
    1. Nayeem
      Thanks for pointing this out. You’re correct, I was intending to reference Qliktech. Sorry about the editorial slip up. Ravi

      Share
    2. Clearly you do not have an OCD for detail on what you write yourself. :o)

      Share
  2. Actually step one is figuring out how to capture the data and how represent the captured data. Finding the “signal” in the raw data using custom algorithms is the real value add that start ups bring to the table. Scaling NOSQL data stores and manipulating the data for reporting purposes will be commodities in a few years. The front end data acquisition through custom algorithms requires innovative solutions. Taking a raw Tweet stream and classifying Tweets at a fine granularity requires new and innovative algorithms that are not in text books. Reverse engineering online databases stored in web sites and normalizing the data from different sites is another significant problem.
    Step two is storing the data.
    Step three is processing the data from different sources to find such things as relationships and clusters.
    Step four is visualization and reporting.
    Companies which supply tools do not have high exit multiples. Companies which supply must have solutions do.

    Share
    1. Derek,
      These are good points. I agree that there will be a host of unanticipated problems (?opportunities for start-ups :) related to ingestion of “big data”. It is inherently unwieldy both because of its size and the heterogeneity of origin sources.

      With respect to mining Big Data repositories for business insight, we believe algormithmic techniques (ie – artificial intellilgence, machine learning, statistical clustering) will be valuable sources of signal. However we also believe in the critiicality of next-gen BI technologies (ie visulaization, faceted search, iterative exploration and tagging, rapid on-the-fly aggregations, structured collaboration) with humans in the loop. These capabilities inform which alogorithms to use and provide an explanation of the “cause-and-effect drivers” behind machine generated insights which are critical to determining the appropriate business actions to be taken.

      Share
    2. Great points Derek. Solving the first step is critical for optimized big-data platform. Can a CDN solution help to identify customer tags/business identifiers/hints and and make multi-dimensional data ( instead of raw logs) available for customers to consume?

      Share
  3. A man smarter than me ones said don’t skate where the puck is or was, skate where the puck will be. Now if we look at a professional player, does he calculate the probability of all players on the ice moving with a probable velocity in a probable direction, his team his opponents taking into account energy already consumed by players, time in the game, and a hundred other variables and process it in a split second with a system running double digits energy consumption?[1]
    After floating a paper to get feedback, the company which large R&D to respond immediately was IBM. I wouldn’t count them out, Watson, Data finds Data … They know the problem inside out and understand it requires something else than faster.

    1. http://www.scientificamerican.com/article.cfm?id=thinking-hard-calories

    Share
  4. Hi Ravi. Great points. I am wearing my PR hat here and wanted you and your readers to check out Chiliad and their recent update to their product Discovery/Alert 7.0. They nail it when it comes to saving hours/manpower/smarts to use a big data search solution. http://www.chiliad.com. Their tagline explains it well — Find meaning in Big Data. Every bit of information at once. Usable by everyone. With economics you’ll love. Real-world Big Data is rarely a tidy package. It lives in diverse formats in many locations. Chiliad Discovery/Alert 7.0 tames it, showing what matters with tools to help you decide what it means. You’re no data scientist? Discovery/Alert 7.0 speaks your language. Tell it what you need and enjoy the results. Doing more with less? Discovery/Alert is frugal. Save millions ordinarily spent on data consolidation or legions of data scientists and software engineers. With a simple subscription and cost-effective deployment, meaning is no longer beyond your reach.

    Share
    1. Thanks. Will check it out!

      Share
  5. This begs the question: just how valuable or important are the types of information held in structured sources compared with unstructured repositories?

    Share
    1. I think this is a great question! Both types of information have inherent business value and while it is early innings, its becoming increasingly clear that substantial additional value can be derived from intersecting the two sources in a typical enterprise. The ability to dynamically “mash up” a selection of traditional “structured sources” (ie operational data warehouse) with newly generated information from an “unstructured system” (ie Twitter api , Facebook api, log-generated event data ) can create valuable insights when compared to either data source stand-alone. I plan to write more on this subject in the future.

      Share
  6. Thanks for the post, I am learning Big Data now and MUST set up my own company in Big Data.

    Share
  7. The one thing I don’t see in your writeup is the use of Big Data without storing it down in a data warehouse (data at rest). To make the use of data fluid, put it into cache memory using one of the many in-memory products available today. Batch processing of large data sets is interesting, but real-time analytics is more interesting.

    Talking to healthcare and other industries, there is increasing amount of data that is very volatile…it only has value for a short time. It has to be used in real-time or it doesn’t ‘work’.

    I wrote up this concept here: http://successfulworkplace.com/2012/09/17/as-data-gets-bigger-time-value-gets-smaller/

    Share
  8. Article puts the finger on the issue but fails to capture how it needs to be addressed.

    For data to payoff two things need to happen – a) Generate insights/ actions that add to bottomline/ topline from the data b) Push the action from the insight to the customer action points (channels,sales, servicing, interent, dealer network, marketing etc) at a customer level so that the recommended action can be taken when the customer interaction happens. Honestly I dont see anybody doing it. Big data will turn out to be a big fad.

    That’s not to say it does not matter. Banking and Insurance companies have been doing this for over 15 years now through the use of risk scores, propensity models applied to lead lists, price elasticity etc although not in real time.

    Big data will payoff when it combines the volume/speed of data with pushing intelligent actions to the channels.

    Share
  9. How do companies like SAS and SAP holdup in the conversation. Seems to be lots of buzz.

    Share
  10. Srini R. Srinivasan Monday, October 15, 2012

    Hi Ravi. Thanks to the post, I am learning more about Big Data. will look forward to your future posts on this topic as well. – Srini

    Share

Comments have been disabled for this post