STRUCTURE 08: Overclocking and Analytics

Now we have a panel, The Race to the Next Database: Overclocking and Analytics Augment Your Data Layer. Our friend Nitin Borwankar is moderating.

Panelists (pictured in reverse order):

  • Mayank Bawa, Aster Data Systems
  • Doug Judd, Zvents
  • Luke Lonergan, Greenplum
  • Damian Black, SQLstream
  • Dave Schrader, Teradata
  • Scott Wiener, Cloud9 Analytics

To start off each panelist is explaining his company:

Bawa, Aster: We are a scalable database for warehousing and analytics that runs on a cluster of commodity nodes. Founded in 2005 from Stanford University, with investment from Sequoia Capital, Cambrian Ventures, First Round Capital. Customers include MySpace and Aggregate Knowledge.

MySpace has a huge amount of data — 1 billion impressions per day, loaded into Aster on an hourly basis, within an hour of it happening the new data should show up. Over 1 terabyte of data being loaded each day into Aster. Very different from traditional database housing because must be completely automated, must be scalable and fast.

Judd, Zvents: Every time you read a news article or watch a YouTube video or click on an advertisement, that info gets logged somewhere. Companies that win will be those that capture that metadata, analyze it and shape their products with it. Examples: Amazon recommendations, iTunes, and Pandora, Netflix, StumbleUpon.

Google is arguably the king of data-driven web companies. Three key pieces of their scalable computing structure: Google file system, MapReduce, BigTable. HyperTable (this is his thing) is an open-source implementation of BigTable, pulls the common scaling logic for data distribution, load-balancing, into a general-purpose infrastructure layer to take focus away from emphasis on scaling.

Lonergan, GreenPlum: Smart databases to process information faster. When people want to understand their data they use the languages they know best. By using SQL, lots of customers, some 40 percent of business came from Asia last quarter. Ubiquity based on open-source.

Black, SQLstream: We query the future. Growing number of applications that need real-time responses. Gives ability to find dynamically relevant content, change pricing, etc. Traditional data warehouse has cascading infrastructure; SQLstream eliminates latency. Overlapping high latency processing stages — collect, query, deliver — made into a single process. This is patented. Very simple, very powerful to make repository of information to make warehouse into real-time processing engine.

Schrader, Teradata: $1.7 billion per year in enterprise data warehousing. Founded in 1979. Puts up very complicated and detailed slide about “active enterprise intelligence.” We’re shifting gears to support in much more real-time the rest of this picture. If you need the history with the context you probably want to interpret the event.

Wiener, Cloud9 Analytics: Focused on using the Internet to deliver business intelligence and analytics to the unserved. Business intelligence worth $50B in 2008 — but in the largest enterprise, where they spend the most, only about 20 percent of the decision makers get this data; at small and medium organizations, it’s even less. We use the SaaS model to bring business intelligence to the unserved. Spent 3 years and over $20 million dollars to build their product. No databases, no reporting — “differential analytics” — but all hidden to customers, just applications in web browser like on their end.

Nitin: How do you see the web disrupting databases?

Bawa: Data growing tremendously fast on the web, with interactivity on the web feedback loop has to finish very fast.

Lonergan: Our customers see value of underlying data as tremendously important for their business, so they are looking at non-traditional approaches.

Wiener: Web is speeding up pace of business, so for a lot of things, human beings can’t be in the loop. Now technology runs the business and people respond to crises.

Audience member: Are you providing SQL databases in the cloud, or is that data too valuable?

Lonergan: It’s exploratory at this point. Customer trying federated approach with elastic computing; we have people investigating using Amazon as experiments. So far no customers using exclusively as data analysis.

Wiener: Our entire business is in the cloud. We have web interface, but for quants who live in Excel, tunnels through HTTPs and lets you query database in the cloud.

Bawa: Now that infrastructure is flexible, and opportunity is there for applications that use publicly available data to mash up with your own data.