3 Comments

Summary:

The social network’s new search architecture, dubbed Galene, is supposedly faster and easier to maintain than the company’s previous search architecture.

LinkedIn has overhauled its search engine infrastructure in favor of a new system dubbed Galene, a homegrown engine designed to improve search results and problems with maintenance, the company plans to announce Thursday.

Using the improved search capabilities of the new architecture, a user can get better tailored results that are heavily personalized; what one user might see in his search results will be different than another user based on one’s own personal information. While this was somewhat possible in LinkedIn’s previous search engine, the new system is clearly faster, explained LinkedIn principal staff engineer Sriram Sankar who authored the blog post detailing Galene along with Asif Makhani, a LinkedIn director of engineering for search.

Search is the heart of LinkedIn, said Makhani, and people use LinkedIn as a professional search engine that helps them find jobs as well as aiding hiring managers who scout people based on specialized skills.

With the old system that was hard to maintain, Sankar said, it was a difficult task for the search engineering team to innovate and improve the quality of searching.

Its prior search engine was developed around the open source Lucene library and contained numerous plugins to tweak performance. The Lucene library allows for simple search functions in the form of storing information like keywords in indexes, searching those indexes when a user performs a search for a certain word and generating results based on relevance scores.

As the company made a push to create what its CEO Jeff Weiner termed an economic graph — the ability to map out the relationships between jobs, companies, talent and other professional descriptors — LinkedIn engineers added more plugins and extensions to its old search engine in order to do more complex tasks, said LinkedIn principal staff engineer Sankar.

Unfortunately, LinkedIn engineers decided that they could no longer keep their search engine up to their standards as the multitudes of extensions — including Bobo, Cleo and Norbert — bogged the team down with maintenance issues. Not to mention the fact that if a developer who set up one of the plugins were to leave, the knowledge and know-how of which plugin was responsible for which task would vanish.

“We had to go through unnatural steps to get the existing system to scale the extra mile,” said Sankar.

LinkedIn Chart 1

A diagram of how Galene is built

LinkedIn decided to scrap all of the extra extensions but continue using Lucene as its indexing layer that can handle queries and retrieve results. Essentially, the Galene architecture the company created does all the work of the previously used plugins without needing constant maintenance, in addition to doing the same tasks faster.

With the new system, a user can initiate a search query that gets passed from the web front-end interface to the back-end servers, where the Galene architecture does the heavy lifting and shoots the results back to the user.

According to the blog post, the search engine’s Federator and Broker services work by receiving the user’s query and associated metadata and shuttling it off to other services like query rewriters, which are used to generate more specific search queries than a user would have taken into account (plurals of words and different spelling variations, for example). The Searcher then takes in the modified user query that’s been altered by the Federator and Broker and does what its name implies and retrieves the matching result from the index based on its relevance score.

The index gets some help from Hadoop to store and update matching results that are again further refined.

From the blog post:

Indexing on Hadoop takes the form of multiple map-reduce operations that progressively refine the data into the data models and search index that ultimately serve live queries. HDFS contains raw data containing all the information we need to build the index. We first run map reduce jobs with relevance algorithms embedded that enrich the raw data – resulting in the derived data. Some examples of relevance algorithms that may be applied here are spell correction, standardization of concepts (for example, unifying “software engineer” and “computer programmer”), and graph analysis.

Galene also allows developers that are part of other LinkedIn groups, like the advertisement department, to create custom searches using APIs without having to consult the search engineering team, said Makhani.

Having a search engine that can map out relationships as opposed to performing more simple searches is important for LinkedIn, and the architecture needs to be constantly modified without causing bottlenecks. As the old system reached its limits of scalability, both Sankar and Makhani are confident that Galene can get the job done.

Post and thumbnail images courtesy of Shutterstock user Gil C.

You’re subscribed! If you like, you can update your settings

Comment

Community guidelines
Sunday, August 31, 2014
you are commenting using your account. Sign out / Change

Comment using:

Or comment as a guest

Be sure to review our Community Guidelines. By continuing you are agreeing to our Terms of Service and Privacy Policy.

3 Comments

  1. Carter Foxgrover Thursday, June 5, 2014

    They are addressing significant challenges in modern search engineering with this update! The way they’ve inserted the query rewriters into Galene’s stack, and pushed more of the not-so-performant batch processing to index-time are big improvements.

    I hope they consider open sourcing it… It would be tremendously helpful for search use-cases that require frequent, real-time updates to the index (especially with graph/ network data). The graph-search capabilities this enables means we should probably expect to see facebook-graph-style search capabilities on Linkedin within a year.

  2. Lucene seems useful only for organizations without the resources to develop their own alternative.

    1. “Lucene seems useful only for organizations without the resources to develop their own alternative.”
      What? They KEPT Lucene.

      Many companies with the resources to use something else still use Lucene. What many are doing is using something that wraps Lucene like ElasticSearch and Solr.

      I’ve used Lucene a few times. My experience is that unless you have custom needs and nee the raw API (like LinkedIn), use one of the things I mentioned.

      I think you would be suprised about how many companies and products (closed and opensource) use Lucene.

      I doubt that many companies with resources could build something better. Why should they? Lucene is opensource.