16 Comments

Summary:

A group of researchers from Stanford has been working on deep learning models that can make sense of whole sentences at a time, and has recently trained its models on a large collection of online movie reviews.

A visual representation of how Socher’s model breaks down sentences.

Stanford Ph.D. student Richard Socher appreciates the work Google and others are doing to build neural networks that can understand human language. He just thinks his work is more useful — and he’s going to share his code with anyone who wants to see it.

Along with a team of Stanford researchers that includes machine learning expert and Coursera co-founder Andrew Ng, Socher has developed a computer model that can accurately classify the sentiment of a sentence 85 percent of the time. The previous state of the art for this task — essentially, discerning whether the overall tone of a sentence is positive or negative — peaked at about 80 percent accuracy. In a field where improvements usually come fractions of a percent at a time, that 5 percent jump is a big deal.

It’s also a big deal to businesses, which are trying harder than ever to automate the task of figuring out what people are saying about them online. Almost every tweet, review, blog post or other piece of content expresses an opinion, but employing a human being to scan every one and instigate some sort of response or enter them into a database isn’t exactly efficient. Early approaches to sentiment analysis or social media monitoring have been kind of crude, often focusing on individual words that don’t account for context at all.

Socher’s team pulled off its accomplishment by focusing not just on single words, but on entire sentences. It took nearly 11,000 sentences from online movie reviews (from research database culled from Rotten Tomatoes, specifically) and created what the team has dubbed the Sentiment Treebank. What makes the Sentiment Treebank so novel is that the team split those nearly 11,000 sentences into more than 215,000 individual phrases and then used human workers — via Amazon Mechanical Turk — to classify each phrase on a scale from “very negative” to “very positive.”

slider

The team then built a new model it calls a Recursive Neural Tensor Network (it’s an evolution of existing models called Recursive Neural Networks), which is what actually processes all the words and phrases to create numeric representations for them and calculate how they interact with one another. When you’re dealing with text like movie reviews that contain linguistic intricacies, Socher explained, you need a model that can really understand how words play off each other to alter the meaning of sentences. The order in which they come, and what connects them, matters a lot.

A simple example of what Socher means would be a sentence like “There are slow and repetitive parts, but it has just enough spice to keep it interesting.” “Usually,” he said, “what comes after the ‘but’ dominates what comes before the ‘but,'” and that’s something a model focusing on single words or even single phrases might not be able to pick up.

slowandrepetitive

A visual representation of how Socher’s model breaks down sentences.

That sample sentence and the visual representation actually come from a website Socher’s team built to show off and help train its model. The site includes a link to the research paper, as as well a live demonstration of the model on whatever sentences people enter, and a tool for exploring the Sentiment Treebank to see how it has classified sentences containing specific words. The code for the model will be available for download on the site in late October.

Over time and with more sample sentences, Socher thinks his model could reach upward of 95 percent accuracy, but it will never be completely perfect. This is because there are always certain word combinations, sentence structures and jargon that don’t appear enough to let the model effectively determine patterns in how they’re used. The movie review training set, for example, didn’t include many emoticons, so Socher’s team is working on adding them to its system.

It also had to develop algorithms to analyze the morphology of words. For example, Socher noted, the word “absurdly” is used infrequently, but an algorithm is able to figure out that adding “ly” to a word doesn’t create a wholly new word with different sentiment.

The new model and Sentiment Treebank by Socher and his team come as deep learning is catching on more broadly, thanks in part to research that companies such as Google, Facebook and Microsoft (Socher is actually a Microsoft Research Ph.D. fellow) have been publicizing in fields such as image recognition (or computer vision), speech recognition and even language understanding. Earlier this week, IBM announced a research partnership with four high-profile universities that focuses in part on deep learning.

Socher acknowledged the impressive work done elsewhere, but he’s not convinced there’s much commercial utility in focusing too much on image recognition (at least right now) or on single words. (Google and others would probably disagree, maybe quite strongly, and probably could probably raise some very good points.) So he and his Stanford colleagues have been focusing on phrases and sentences, and aside from sentiment analysis, he says their models are pushing the state of the art in areas such as machine translation, grammatical analysis and logical reasoning.

“You’ll never care about translating a single word to another single word,” he said. “We’re actually able to put whole sentences and longer phrases into vector spaces without ignoring the order of the words.”

  1. Reblogged this on ChangSu's tech blog.

    Share
  2. Using Mechanical Turk in intelligent ways can make many more things possible. For instance, there was an idea on Firespotting to sell anything online with a single picture –

    http://firespotting.com/item?id=2462

    Share
  3. Yawn…. 80% to 85% and that’s progress or actionable?! It’s still positive, negative, neutral! Move beyond traditional sentiment.

    Share
    1. Actually if you read the paper, the system can do fine grain sentiment “somewhat negative”, “extremely positive” and the like. For comparison of the original data set, they operated in a positive/negative/neutral mode because that is what the original data set was labeled as. At the fine grain level they got it right 80% of the time, but I didn’t see if that was lower because something that was extremely negative was categorized as somewhat negative and the like. In terms of use, both styles of outputs have value.

      Share
  4. Lots of careless mistakes in this article… and just as I thought, no copy editor or proofreader on the staff. It’s a shame.

    Share
    1. The three extremely minor mistakes I found on a second read are fixed.

      Share
      1. All you have to do now is get off his lawn.
        And no soup for you!

        :)

        Share
  5. Amrudesh Santhanam Thursday, October 3, 2013

    Very interesting development. I understand that it is primarily ‘Sentiment’ Analysis. Wanted to know how close is this to having a machine read an essay and determine whether it makes the same sense and covers all or most of the points from a reference essay? I have also seen some developments in this front and Edx also sharing a similar tool as open source. Certainly has a great future. :)

    Share
  6. Daniele Roganti Friday, October 4, 2013

    I hope it works with sarchasm, too.

    Share
    1. if people learn to spell right im sure it will get the hang of more than just sarcasm.

      Share
  7. Vivek Narayanan Friday, October 4, 2013

    While 95% accuracy would be a really phenomenal achievement, an accuracy in the range of 85-90% is achievable using methods simpler than deep neural nets. I have done some work on sentiment analysis in the past. I used a Naive Bayes model with some enhancements like n-grams, negation handling and information filtering and was able to get more than 88% accuracy on a similar dataset based on movie reviews.
    You can find more details here -http://arxiv.org/ftp/arxiv/papers/1305/1305.6143.pdf and the code over here – https://github.com/vivekn/sentiment/blob/master/info.py

    Share
    1. even you wrote a paper in the field, i think you don’t have any idea about sentiment analysis. analyzing sentiment on document level compared to sentence level is like comparing apples and pears.

      Share
  8. I’m guessing that Socher et al’s sentiment classifier can also be used to tell whether a given sentence is “plausible” or “implausible”, assuming the sentence fed to it is grammatically correct. If this is the case, then perhaps it can be combined with the following neural net-based text-generation algorithm

    (Skip to 48 minutes, 58 seconds in the video.)

    to produce an advanced chatbot: you could train the text-generation network with a very large number of two-person dialogues (there are surely petabytes of training data on the web); so that if it starts by knowing what the other person wrote, it produces a response one character at a time. And now, if it starts writing nonsense, the plausibility/sentiment classifier can detect that; and then some characters can be deleted from the output stream, and the text-generator can try again, until it satisfies the sentiment classifier. Think of it as pairing a mental filter with a creative mind.

    One could also imagine having the text-generator know more than just the previous line in a conversation — maybe there is some way for it to remember the previous several lines; or at least the gist of what was said.

    Share
  9. I think there is still a long way to go as it is just a classifier but not totally understand the meaning the sentence .If we can use DNN for storing and refreshing the knowledge , it can be called truly big progress.

    Share
  10. Dean Malmgren Friday, October 11, 2013

    Really great and spectacular that the authors are going to open source their work. This is why the data space is moving so quickly now and why it is so exciting to see what will happen next.

    Share

Comments have been disabled for this post