Boston has some pretty good database DNA dating back to Digital Equipment Corp.’s venerable Rdb and a raft of small object-oriented database firms that popped up 20 years ago. Those good genes are showing up again in a fresh crop of database companies clustered around the Boston-Cambridge nexus.
The fact that database pioneers Michael Stonebraker (famous for Ingres, Postgres, Streambase and Vertica) is in the area, as is Jim Starkey (instrumental in Rdb, InterBase and Netfrastructure), helps draw database talent from elsewhere and keeps local prospects coming out of MIT or other area schools where they are.
In the traditional, relational database world, the power sphere remains in Silicon Valley, with Oracle, but even that company has been drawn to Boston, where it beefed up its database portfolio with acquisitions like Endeca. Hewlett-Packard, another valley giant, bought Tewksbury, Mass.-based Vertica and is moving its “big data” operations to Cambridge. Clearly, the area is a center of gravity for a lot of NewSQL, NoSQL and other new-age database startups.
Here are five up-and-coming database companies — most with a “big data” flavor — to watch.
1: Akiban Technologies: This startup’s goal is to speed up database queries by putting data tables into logical groups to make it much faster for users to perform database joins. (Joins are common database operations that combine records from two or more tables.) The basic idea of table-grouping technology, which will work with standard MySQL databases, is to co-locate information that is likely to be involved in common queries together to streamline the process.
Akiban co-founder and CTO Ori Hernndstat uses an online dating site to illustrate. “Say I’m interested in all the people in my area with red hair and a fascination with dogs. When you put that in a relational system, it explodes into many tables — that’s the process of normalization. You need all users joined with are-they-online-now, joined with regions, joined with hobbies — dogs. That is a four-way join. [Our] technology … captures all the tables that belong to a given object or profile, and then any queries of those pieces [are] very efficient,” he said. In this example, speed is of the essence. The idea is to see who’s online right this minute that meets all these other criteria.
Hernndstat and co-founder and CEO David McFarlane, formerly with IMLogic and Nexaweb, started the company — then called Akiba Technologies — in July 2009 and at that time year netted $6.5 million in Series A funding from Northbridge Venture Partners and Foundation Capital. It has 20 employees, all but one in Boston.
2: Paradigm4: Here’s a company that seeks to analyze “highly dimensional” data: satellite data, geospatial data, genomic data — stuff that tends to display more as an array than as rows and columns. Paradigm’s CTO is the peripatetic Stonebraker. CEO Marilyn Matz likes to say that Paradigm4 targets “multistructured” data where the key is to be able to look at not just the data itself, but the associated meta-data.
“When you have all this machine-generated or other data, what makes it more interesting is you also have other information about the data. Information about when it was generated, what machine it was produced on, was there a bad data point and did someone go back in and correct it? All that meta-data is valuable, and you want to keep it tied in to the data itself,” she said in an interview.
The result is often matrices and 2-D arrays — data structures that, in Matz’ view, need a better framework for complex analytics. “In operations on matrices and arrays, we’ve proven over time is, if you store all the data natively, the analytics you use will give you high performance,” she said.
Paradigm4, unlike IBM’s Netezza analytics appliance, with which it will compete, is a software-only solution, now in early beta. It will be made available in both free, open-source and commercial versions. The Waltham, Mass.-based company, with 18 employees, is backed by Sigma Partners and Kepha Partners. Matz will talk more about the big data analytics question at GigaOM’s Structure:Data conference later this month.
3: Cloudant: Historically, if a cable company or a telco wanted to know what’s going on in its far-flung network — how customers are being serviced, if they’re paying the right amount for what they get — it would collect all that data at the zillions of local endpoints, replicate the data back at a central location and then push it through some analytics.
With Cloudant, a lot of that shlepping goes away, said Derek Schoettle, CEO of the
six- three-year-old company. Cloudant lets this customer instead put an instance of Cloudant at the endpoint edge, where it collects and crunches the data and then sends back the processed information rather than the unwieldy original data set.
Schoettle describes the product as a scalable “data layer as a service” built atop the open-source Apache Cloud database, JSON – a common data exchange format — and MapReduce, a popular framework for distributed data.
Cloudant’s data layer can be deployed around the world to collect, store, analyze and distribute applications — to place its processing and aggregation power where it’s needed. The company was founded by three MIT physicists and late last year brought in Schoettle, a former VP at Vertica Systems, as CEO. (Cloudant named Andy Palmer, Stonebraker’s co-founder at Vertica, to its board this week.)
4: Parelastic: How many times have we heard about an e-commerce site brought to its knees by an unexpected spike in demand? What that usually means is that the relational database powering the site simply can’t scale up to process all those orders. Parelastic wants to crack that problem of database inelasticity while retaining the useful life of existing SQL (specifically MySQL) databases.
Towards that end, the Waltham, Mass., company built middleware that layers atop existing MySQL databases to distribute that workload as it arises. The advantage is that the thousands of people who already have MySQL databases (and skills) can keep using them.
“The problem with SQL systems was that everything went virtual except for the databases. You had all these processors, but everyone kept banging against the same database, which is why people started to abandon SQL for NoSQL. You rebuilt your database atop NoSQL to replicate what you had in SQL, but then you had to find people to deal with databases no one had heard of,” said John Landry, of Lead Dog Ventures, a Parelastic board member and investor. Parelastic seeks to make SQL databases elastic and run in parallel. “Hence the name Parelastic,” Landry said.
The company was founded in 2010 by Amrith Kumar, formerly VP of Dataupia, which built the Satori Data Warehousing platform, and Ken Rugg, who was SVP at Progress Software.
5: NuoDB: Barry Morris, founder and CEO of NuoDB said his company has another, NewSQL, take on distributing workloads. It uses peer-to-peer messaging to route tasks to as many nodes as needed to get the job done. “We work almost more like BitTorrent than traditional databases,” he said.
“We use the example of a flock of birds that takes off together, flies at the same time, but there’s no one in charge. It’s peer to peer. Lots of players doing very simple things, but the overall effect is impressive,” he said.
Older SQL databases are great if you can pre-provision your load, he said. They’re not great if there’s an unforeseen spike in a load. If your product is mentioned in The Wall Street Journal and your orders spike 1000-fold, that is the type of workload that is tough for traditional databases to handle. NuoDB’s messaging infrastructure divvies up and provisions loads fast across multiple nodes in a peer-to-peer fashion. NuoDB is in beta now and is slated to ship in a few weeks, he said.
NuoDB’s co-founder and CTO is Starkey. The company is backed by Hummer Winblad Venture Partners, with Mitchell Kertzman, the former CEO of Sybase, taking an active role, as well as Longworth Ventures.
Note: This is by no means a comprehensive list of hot database companies in or around Boston. Just a few worth checking out.