One of the themes of our upcoming Structure:Data conference is “putting big data to work,” and there’s no easier way to get started doing so than with a cloud service. You don’t have to buy hardware, you don’t have to manage systems and, in some cases, you don’t need to know the first thing about Hadoop.
In September, I profiled six companies doing big data in the cloud, and here are nine more. They’re not providing cloud-based applications (e.g., anti-malware services or site-optimization) that happen to use big data techniques on the backend; these companies providing access to big data resources or analytics engines from which customers can draw their own conclusions.
Cetas: Cetas (see disclosure) is a stealth-mode startup focused on providing an entire analytics stack in the cloud (or on-premise, if a customer prefers). The driving theory is to let companies running web applications get the types of user analytics that Facebook and Google (s goog) are able to get, only without the teams of expensive engineers and data scientists, Cetas VP of Products Karthik Kannan told me. While most of that functionality is prepackaged now into core capabilities, Kannan said Cetas plans to let power-users build their own custom models and tie Cetas into existing analytic platforms.
DNAnexus: DNAnexus stores mountains of genomic data in the Amazon Web Services(s amzn) and Google(s goog) clouds so that researchers, doctors and others interested in DNA have a centralized place to access and analyze that data. With genome sequencing getting cheaper by the year but still producing the same amount of data per genome, we’re facing a possible deluge of DNA data. Whereas the high-end research facilities might have access to high-performance computing and storage necessary to perform DNA sequencing, the hospitals that will now be doing those analyses on a regular basis certainly will not, DNAnexus CEO Andreas Sundquist told me in October.
Google: Google has a multi-pronged strategy on cloud-based big data services, but the two services that stand out most are BigQuery and the Google Prediction API. Google describes BigQuery, available in limited preview now, as a service that “allows you to run SQL-like queries against very large datasets, with potentially billions of rows. … BigQuery works best for interactive analysis of very large datasets, typically using a small number of very large, append-only tables.” The Google Prediction API is just what it sounds like, a service that puts machine-learning and pattern-detection capabilities in developers’ hands so they can analyze application data for things such as sentiment analysis, system analytics and recommendation engines.
Infochimps: Once a startup focused on its data marketplace, Infochimps has morphed into a provider of big data infrastructure as a service that provides its marketplace data as a value-added feature. Describing the new Infochimps Platform in February, I wrote, “The platform is hosted in the AWS cloud and supports Hadoop, various analytical tools on top of that — including Apache Pig and Infochimps’ own Wukong (a Ruby framework for Hadoop) — and a variety of relational and NoSQL databases.” But the key is the platform’s automated nature, which CEO Joe Kelly hopes “will help answer the question of ‘what does a Heroku for big data look like?'”
Kognitio: Kognitio is an analytic database vendor that offers a wholly cloud-based version of its flagship WX2 database, called Kognitio Cloud. It has attracted customers such as Orbitz (s oww) with its in-memory technology that, as I explained in October, is popular because it “takes less time to process information stored in a system’s memory than it does information stored on a hard disk. That means companies can get closer to real-time analysis as data streams in, or far faster results when running less-timely queries.”
Medio: Medio offers an end-to-end suite of cloud-based capabilities, under the inGenius platform banner for analyzing user information. It’s inSight product lets developers tune applications to capture certain data from users, but inTeract is the predictive analytics engine. With inTeract, Medio says, “you can compare and evaluate the effectiveness of specific targeted messages, personalized content and tailored offers and quickly calculate the impact each will have on customer ROI, engagement and loyalty.” Its inCent service automates the process of acting upon whatever insights customers have discovered. Under the covers of Medio, of course, is Hadoop.
Metamarkets: Metamarkets (see disclosure) is a cloud-based analytics engine designed to help users understand behavior on their websites. Here’s how I described it in a January post: “The Metamarkets product is a cloud-based big data application previously tuned for helping online media companies analyze the streams of data they generate everyday as customers click their way through the sites. The company uses a specialized version of Hadoop for parallel processing, but it goes much further by adding a custom-built in-memory database for real-time queries and by providing visualization and predictive modeling capabilities.”
Microsoft: Microsoft (s msft) was late to the Hadoop game, but has been making up for lost time since October. I recently described its progress on the Hadoop on Windows Azure offering: “The company opened a preview of the service to 400 developers in December and on [March 6], … opened it up to 2,000 developers. According to Doug Leland, GM of product management for SQL Server, … Microsoft is trying ‘to provide a service that is very easy to consume for customers of any size,’ which means an intuitive interface and methods for analyzing data. Already, he said, Webtrends and the University of Dundee are among the early testers of Hadoop on Windows Azure, with the latter using it for genome analysis.”
Xignite: Xignite offers market data to financial services clients, all from a centralized store in the AWS cloud. As my colleague Stacey Higginbotham described its API-based service when covering the company’s $10 million investment round in September, “Xignite stores ‘petabytes of data’ on Amazon’s cloud, and the goal is to keep that data up there so folks can analyze it on cloud-based servers and just get the information they need in the format they want when they ask for it.”
Disclosure: Cetas and Metamarkets are portfolio companies of True Ventures, which is also an investor in GigaOM. Om Malik is also a venture partner at True.