How TripAdvisor engineers helped its business people find and analyze its Hadoop data

Business analysts working at TripAdvisor kept coming up to developers in recent months and saying they’d heard valuable data was sitting in Hadoop clusters, but they had no clue how to query it all.

That’s what made senior software developer Stephen Scaffidi spend some free time coming up with a method for analysts to ask questions of data in Hadoop with Hive without knowing the SQL-like HQL query language that’s ordinarily required.

The result of Scaffidi’s efforts were rolled out to employees in November, and people from many departments have used it. And last Friday Scaffidi made his Hive Query Tool available through an open-source Apache license.

After a business user logs in and enters a request for a Hive query through the simple user interface, the tool sends an email when the job has started. The email contains a link to a website showing the status of the job and offers a way to download the data. It’s intended to be simple, to enable employees other than data scientists to use Hive.

Hive Query Tool

“We needed something that (users) would adopt quickly, not something that they would follow the steps on a website or the vendor that we got it with, you know, 23.5 steps for installing, configuring, and then it still doesn’t work,” Scaffidi said at the Hadoop Summit in San Jose on Wednesday. “They liked how it worked. They requested more features.”

Next steps include improving the back-end code and creating a system for allowing users to request recurring jobs, Scaffidi said

To be sure, IT admins could have brought on proprietary software or gone with open-source options for simplifying Hive queries, but that would have required engineers to learn how everything worked and then teach everyone else. Scaffidi found it far more convenient to just roll his own version. And that means more people inside the company were able to start doing Hive queries faster.

The project certainly made sense to carry out internally. It allows for democratization of data within the company. Now other companies have the option of trying out the tool.

Feature image courtesy of Flickr user Orin Zebest.