Blog Post

Infobright wants to make big data faster … way faster

Infobright announced version 4.0 of its analytic database designed for machine-generated data on Tuesday, complete with a handful of innovations that should significantly the speed and accuracy of poring through piles of big data.

Although it’s technically similar in its column-based approach to data management, Infobright has a tendency to get lost among the bigger-name analytic databases such as Vertica (now part of HP (s hpq)), Greenplum (now part of EMC (s emc)) and ParAccel. This result might have something to do with Infobright’s sharp focus on machine-generated data such as log files, call records, web transactions and other information created as the devices we rely upon go about their business. Infobright CEO Don DeLoach explained that many of the other analytic database vendors take a broader approach and end up looking a lot more like traditional data warehouses.

In order to address its machine-generated data target, Infobright’s core technological difference is something it calls the Knowledge Grid. DeLoach explained the Knowledge Grid: Infobright’s database cuts data columns into 64KB “data packs,” and in addition to creating pointers to the data on disk, also creates metadata pointers to the data packs in memory. By leveraging metadata and the faster performance of accessing data stored in memory, DeLoach said Infobright can improve query times, compression rates and ingestion time.

In its 4.0 release, the company has implemented features to make the experience even faster. These include DomainExpert, specific intelligence about particular domains (e.g., URLs, IP addresses and e-mail address for web data); Rough Query, a tool for quickly finding appropriate data ranges; Distributed Load Processor, a parallel-processing engine for compute-intensive data compression; and a connector to quickly load data into Infobright from Hadoop clusters.

The most interesting of the group might be Rough Query, which speeds the process of finding the needle in a multi-terabyte haystack by quickly pointing users to a relevant range of data, at which point they can drill down with more-complex queries. So, in theory, a query that might have taken 20 minutes before might now take just a few minutes because Rough Query works in seconds by using only the in-memory data and the subsequent search is against a much smaller data set. The Distributed Load Processor, explained Infobright VP of Product Management Susan DavidDavis, works its magic by performing the compression in parallel on remote servers, thus saving load on the database server. The Hadoop connector is a feature within the load-processing engine.

In the next 12 months, David Davis added, she thinks Infobright can achieve a 50:1 data compression rate, which has a lot to do with its focus on metadata and domain-specific knowledge.

Infobright might be a small player among its peers, but no one should feel bad for its chances of competing against mega vendors like EMC and HP. Machine-generated data is proliferating faster than any other type because of the sheer number of devices in existence, and companies are learning that there’s a lot of operational and business-level knowledge to be derived from those records. Throughout history, small vendors with sharp focuses have been able to hold their own against large vendors, so Infobright’s place in the big data space should be pretty secure.

One Response to “Infobright wants to make big data faster … way faster”

  1. Very good article, but it includes a few typos.

    In the first sentence, “innovations that should significantly the speed and accuracy” seems to be missing a verb for the adverb “significantly” to modify.

    Infobright data packs contain 64K values, not “64KB.” Since each value typically contains more than one byte, data packs typically contain more than 64KB when they are uncompressed.

    Finally, the Infobright VP of Product Management’s name is spelled with an “s” not a “d” at the end, making her Susan Davis, not “Susan David.”