Paul Doscher, CEO of Lucid Imagination, wants you to know that when it comes to enterprise search — and search that can handle the big data wave — open-source Lucene is a contender.
Of course, as head of the company that offers both open source and commercial versions of Lucene, Doscher is no neutral observer. At the company’s Lucene Revolution conference Tuesday in Cambridge, Mass., Doscher announced an application development stack that knits together Hadoop, Mahout, R and Lucene/Solr for handling search, machine learning, recommendation engines and analytics as a platform for enterprise search. That stack, called LucidWorks Big Data, is in beta and aims to make it faster and easier for developers to deploy enterprise-scale search.
“Most Hadoop instances now are one-off — they’re not scalable and not repeatable,” Doscher told me in an interview. “With our stack — all of which is available via APIs, you can use your own user interface and algorithms, and get productive much faster.”
Lucene, the product of an Apache Software Foundation project, is already used by a ton of e-commerce sites — Zappos, the big online shoe store, for example, uses Lucene for 63 million customer searches, according to Computerworld. That’s interesting since Amazon, which bought Zappos three years ago, is now transitioning from Lucene to its own A9 search. A9 is the technology underlying Amazon’s Cloudsearch service announced a few weeks back.
But other big users include EMC(s emc), which is replacing Microsoft’s(s msft) FAST Search technology in EMC’s Documentum document management system with Lucene.
Searching for the right search
As structured and unstructured data proliferate, the need to index and search that data efficiently will only grow.
To be sure, Lucene (which is the core engine) and SOLr (which is the more developer-friendly wrapper around that engine) are not the only dogs in this fight. Other players include the Google Search Appliance, and HP(s hpq) Autonomy — which Doscher called the “800-lb gorilla.” And there’s Microsoft with FAST and now Amazon, which is continually building up its cloud-based services. Lucid Imagination, Redwood City, Calif., offers both on-premises and cloud-based versions of its Lucene/Solr-based search.
Lucid, which employs 9 of the 36 contributors to Lucene, seems the fan favorite of the open-source contingent although there is a rival in Elasticsearch, said Lou Romm, senior program manager for Search Technologies, a consulting firm that helps businesses evaluate the best search for their needs.
Granted, it was a biased group at the conference, but several attendees — including one from the M.D. Anderson Cancer Research Center — said there really is no alternative to Lucene for his purposes. Lucene is able to handle all the data — images, text, structure, unstructured — that choke other solutions. “That’s a big deal when you’re trying to save lives,” he said.