Retrevo Relies on Machine Learning to Challenge

Consumer electronics recommendation engine Retrevo (see disclosure) launched a new feature this morning that, much like the (s amzn) Marketplace, lets visitors buy electronics straight from Retrevo without having to visit the third-party merchant’s site. However, for Retrevo to meet its lofty goals of dethroning the king of e-commerce even in this single category, it will have to rely on the accuracy of its machine-learning algorithms. For Retrevo, its entire operation depends upon using big data to its advantage.

One of Retrevo’s biggest values is its rating system, which lets users know which products are the best deal for whom, and whether a product is in danger of becoming obsolete. According to Retrevo co-founder Manish Rathi, the site is able to make such determinations by analyzing myriad specifications for each product and comparing it to the average price of similar products in that category. Essentially, he explained, you train the system to learn what a particular set of features means relative to its worth, and then determine what’s a fair value.

But value can be a fleeting concept, especially in consumer electronics, where technology evolves quickly. To keep up, Rathi says Retrevo also analyzes things like the velocity of the sales channel (i.e., how many people are selling it), the average time between new versions of a product, and user sentiment to help users figure out whether now is the right time to buy. A product that falls into Retrevo’s “Over the Hill” category, for example, isn’t yet obsolete, but likely will be replaced by a newer version pretty soon.

In order to give such a thorough view of each product’s relative value and life expectancy, Rathi says Retrevo analyzes about 100 million different data points each day over the approximately 1 million products in its system at any given time. Every time a new product enters the market, an existing one is discontinued, or even when a vendor just changes a price, Retrevo’s system must recalculate the value of every other comparable product accordingly. In order to determine which products belong in what categories, Retrevo parses unstructured data from around the web, such as a product’s description on, pulls out key words, then turns that into structured data. Then, users are able to search by product, color, brand or any other number of attributes.

While this all might sound like high computer science, Retrevo is still a small company in terms of both size and IT infrastructure. Retrevo’s algorithms had better produce accurate results, or users will stop using the service in a hurry, likely going back to one of the 800-pound gorillas in e-commerce, such as or Best Buy (s bby). Both often have products among the lowest prices on goods; users know what they’re getting in terms of service, and customers can take user reviews with a grain of salt, whereas Retrevo is a relatively new service (the direct buy feature certainly is brand new) that relies on the accuracy of web data to offer its unique service. Of course, for users concerned about making sure they’re getting expert reviews and not having the comments of an electronics Luddite skew the results, there’s always Consumer Reports.

Underlying Retrevo’s analytic algorithms, many of which are machine-learning algorithms that Retrevo tweaked for its own purposes, is a proprietary MapReduce-like engine to process the data that runs atop a cluster of approximately 50 desktop computers. It’s like a throwback to the early days of grid computing and cycle scavenging (using the aggregate power of employees’ desktops for large jobs during off hours), and it’s a far cry from the massive infrastructure likely powering Amazon’s personalized recommendation engine. When talking about big data, though, more data can mean better results, and there’s something to be said about a massive data warehouse. In fact, Amazon definitely has the resources to get into the recommendation game if it wants, but it might not behoove a direct retailer to suggest that its product are overpriced.

With the advent of Hadoop and other big data tools, though — including the machine-learning-focused Hadoop subproject Mahout — the possibility of other startups nosing in on Retrevo’s space isn’t inconceivable. We’ll be covering the gamut of advanced analytics techniques at next week’s Structure Big Data event, any number of which could help spawn a new competitor. For the time being, though, Retrevo appears to be doing its own thing and doing it well, despite its small stature and always-looming shadow of Amazon.

Disclosure: Retrevo is backed by Alloy Ventures, a venture capital firm that is an investor in the parent company of this blog.

Boxing image courtesy of Flickr user maxintosh.