Qubole isn’t the biggest Hadoop service running in the cloud (that honor almost certainly goes to Amazon (s amzn) Elastic MapReduce), but a year after launching the company appears to be doing reasonably well. On Wednesday, it released some interesting and possibly insightful statistics into usage of its Qubole Data Service for the month of July.
All told, the company’s customers (which number more than 50) used more than 100,000 nodes to run more than 350,000 jobs and process more than 1 petabyte of data. However, it’s difficult to tell exactly how meaningful those numbers are without some revenue attached (Qubole’s pricing tiers don’t account for nodes, jobs or data volumes) or some numbers on how other Hadoop services are doing.
But that seems like a fair amount of activity for such a young platform (by comparison, though, Facebook (s fb) alone runs about 2 million Hive queries a month), and the types of customers are telling of what’s possible. There’s a whole world of web companies, small- and medium-sized businesses, and other non-Facebooks or Yahoos (s yhoo) that will want to use Hadoop but not want to run it in-house. And as I explained in a recent post about best practices for big data startups, offering a cloud service makes it easier for these users to get started with the platform and for Qubole to keep improving.
Qubole’s founders, Ashish Thusoo (pictured above) and Joydeep Sen Sarma, were the creators of Hive at Facebook, and they raised a $7 million series A round of venture capital in April.
Despite their pedigree, though, the Qubole team can’t take anything for granted. Whatever the demand really is for Hadoop in the cloud, there are a lot of other startups fighting for it. Aside from Amazon Elastic MapReduce, the 800-pound gorilla in the space, the competition includes IBM, Joyent, Mortar Data, Altiscale, Infochimps and Treasure Data.