More news from the Google-data-scientists-conduct-the-coolest-research desk: YouTube has created an algorithm for determining what videos are funniest based on the intensity of viewer comments. It sounds fairly unimportant, although rather interesting, but YouTube’s work actually speaks volumes about the potential of social-media sentiment analysis.
As Google (s goog) Research’s Sanketh Shetty explained in a blog post Thursday morning, YouTube wanted to conduct a Comedy Slam competition similar to its recent Music Slam competition, but it needed a way to pare down the list of competitors. Shetty and his team began by classifying videos using machine learnings: characteristics such as shaky cameras, audible laughter, certain tags and comments insinuating laughter (e.g., lol, rofl and hehehe) all point to videos being humorous.
But YouTube still needed to narrow down the competition by what is actually funny and worthy of a spot in the Comedy Slam. As Shetty explains:
Raw viewcount on its own is insufficient as a ranking metric since it is biased by video age and exposure. We noticed that viewers emphasize their reaction to funny videos in several ways: e.g. capitalization (LOL), elongation (loooooool), repetition (lolololol), exclamation (lolllll!!!!!), and combinations thereof. If a user uses an “loooooool” vs an “loool”, does it mean they were more amused? We designed features to quantify the degree of emphasis on words associated with amusement in viewer comments.
When all was said and done, the team had created a ranking algorithm that identified comedic videos and then determined which ones might be the funniest.
But as anyone following the big data movement knows, the impacts of this type of research spread beyond the world of ranking YouTube videos — something we’ll address over two days at our Structure: Data conference next month in New York. The text-based aspects of Google’s work could change the way publishers view web analytics — perhaps shifting the focus from page views to reader reaction — and video classification could help in analyzing surveillance video or crawling the web for illegal content.
It looks like fun and games when Google does it, but this type of data analysis should become very important as data science and overall big data skills start making their way into the greater world and decisionmakers across industries start realizing what’s possible.