14 Comments

Summary:

With the web and cloud computing generating new data sources and consumption patterns, a fresh crop of software solutions and companies have emerged to tackle big data. In order to better understand the trends, let’s take a look at some of the popular solutions.

With the web and cloud computing generating new data sources and consumption patterns, a fresh crop of software solutions and companies have emerged to tackle big data. And while there’s been rapid adoption of big data tools on the part of large web properties, the business models around building, delivering and monetizing big data solutions are still taking shape. In order to better understand the trends, let’s take a look at some of the popular solutions.

Many of these solutions are focused on a NoSQL approach, which relaxes restrictions of earlier database models in order to achieve a significantly higher degree of scalability. And while these solutions may not be ideal for bank transactions, they’re often perfect for dealing with voluminous amounts of web application data.

There are currently three broad categories for commercializing big data. The first is to sell professional services for open-source software, in which companies rely on existing software distribution mechanisms, such as the Apache Foundation, and provide commercial support on a per-node or per-site basis. The next model is to sell software licenses for a product, or to sell support for a custom distribution. The third is to sell software as as service, accessible in the cloud.

In no particular order, here are several companies that have formed a commercial entity around a software product. Most have also successfully raised venture financing.

  • Couch.io, which under parent company Relaxed focuses on Apache CoucheDB. It raised $2 million from Redpoint Ventures in December 2009.
  • 10gen, which offers commercial support, training and services for NoSQL document database MongoDB and has raised $3.4 million in Series B round from Union Square Ventures and Flybridge Capital.
  • Basho Technologies, provider of RIAK, a distributed data store, in both an open-source and paid commercial version. The company recently filed for a $2 million debt and options offering after raising $2 million from Harbor Island Equity Partners and the Wilmington Investor Network.
  • Cloudera, an early entrant offering professional services for Hadoop, now positions itself as an enterprise platform for the popular big data engine. The company has secured $11 million in two financing rounds.
  • Neo Technology, developer of Neo4j, an open-source graph database, raised $2.5 million from Sunstone Capital and Conor Venture Partners. Graph databases are particularly useful when it comes to storing models of network-connected information, including everything from social networks to cellular tower networks.
  • Loggly, which provides log management as a service, recently raised $4.2 million, following a small seed investment late last year.
  • Hypertable, provider of commercial support for the C++ implementation of a scalable key value store similar to BigTable from Google. The software is in use by several large properties overseas including Baidu and Rediff.com, India’s largest web property.
  • CitrusLeaf, whose elastic, fast-transaction, distributed database targets web, mobile and social networking applications.

And some are finalists in the LaunchPad competition at GigaOM’s upcoming Structure conference:

  • Riptano, which recently emerged to provide commercial support and services for Cassandra, a popular distributed database that originated at Facebook and is now used by Twitter, Digg, SimpleGeo and others.
  • Cloudant, a Y Combinator company that offers CouchDB in the cloud. CouchDB is a scalable document-oriented database written in Erlang, a programming language used at Ericsson and now undergoing a renewed level of popularity.
  • Datameer, which brings spreadsheet intuitiveness to solving big data problems with Hadoop. The company recently secured $2.5 million in seed financing from Redpoint Ventures.
  • NorthScale, provider of a distribution of memcached, the popular open-source caching framework. It also has its own proprietary Membase server in the works, which will offer tunable persistence, pluggable storage engines and configurable replication — all important for handling big data. NorthScale has raised $15 million in two rounds from Mayfield Fund, Accel Partners and North Bridge Venture Partners.

Of course there are popular big data projects not listed here such as HBASE, which has yet to attract a commercial entity providing support. There are also semi-corporate-sponsored projects like Project Voldermort at LinkedIn or Redis, now sponsored by VMware. There are undoubtedly others as well. If you know of a new company that has formed to provide support or software solving big data issues, please leave it in the comments.

Few doubt the impact big data is now having on the design and implementation of web and cloud applications. But the opportunity to monetize the solutions to those problems is still open and the leaders are still emerging. Be sure to check out the two big data panels at Structure 2010 — “Scaling the Database in the Cloud” and “Dealing with the Data Tsunami” — to get the latest scoop.

Gary Orenstein is host of The Cloud Computing Show.

You’re subscribed! If you like, you can update your settings

By Gary Orenstein

You're subscribed! If you like, you can update your settings

  1. williamweber Monday, May 31, 2010

    The guys over at Drawn To Scale are a start-up developing a platform for solving big data problems. http://www.drawntoscalehq.com/

    1. Thanks William. Yes, I am eagerly awaiting the details of their solution.

  2. Charles Zedlewski Monday, May 31, 2010

    Hi Gary,

    Thanks for the Cloudera mention in the article. While it’s not part of our archive just yet, Cloudera is supporting HBase with a controlled number of customers today. We see HBase as a natural complement to Hadoop and believe we’re close to getting HBase to the level of quality & stability that will merit it’s inclusion in an upcoming Cloudera distribution.

    Stay tuned for an update to the archive and a blog post about HBase.

    Thanks,

    Charles

    1. Charles, Thanks for letting me know. Looking forward to hearing more about Cloudera’s take on Hadoop/HBase.

  3. BigData is about more than the actual engine that processes the data – it’s about effectively designing, deploying and debugging BigData jobs.

    I suggest you look at http://www.karmasphere.com (the web site does not do the product justice)

  4. Couchio, Cloudera, Northscale – Page One PR – Public Relations and Social Media in Silicon Valley Thursday, June 3, 2010

    [...] GigaOM Commercializing Big Data [...]

  5. NoSQL Pioneers Are Driving the Web’s Manifest Destiny Monday, July 12, 2010

    [...] 1800s. However, as a movement, NoSQL adherents are blazing a similar path in both importance and an opportunity for economic gains. Below, I’ve included one of many charts in the report, which offers a lay of the land for [...]

  6. Hadoop Gets Commercial Cred as Cloudera and Netezza Connect Thursday, July 15, 2010

    [...] integrating Hadoop support (via a Cloudera partnership or otherwise) and further evidence that Hadoop has legs as a commercial technology for big data [...]

  7. John Terrill Monday, August 2, 2010

    I love that HBase is getting so much interest these days!

    GOTO Metrics (http://www.gotometrics.com/) is also using HBase really heavily. It has proven to handle a wide variety of use cases for our clients including some very interesting ones with lots of non-ascii data.

    I’m really interested to see some of the future advancements in HBase, especially with respect to real time analysis (possibly using co-processors).

  8. Takeaways From the Facebook and Foursquare Outages : Cloud « Saturday, October 9, 2010

    [...] uses MongoDB, a “document database” that also falls into the NoSQL category. One of the themes behind the NoSQL movement is scale, and while this kind of event should not be [...]

  9. Thanks for the Cloudera mention in the article. While it’s not part of our archive just yet, Cloudera is supporting HBase with a controlled number of customers today.

  10. Big Data and NoSQL March to the Enterprise: Cloud « Saturday, October 30, 2010

    [...] looked at who is commercializing big data over the summer, but just recently, we’ve seen both large and small amounts of money raised [...]

Comments have been disabled for this post