Researchers from Allen Institute for AI have built a computer system capable of teaching itself many facets of broad concepts by scouring and analyzing search engines using natural language processing and computer vision techniques.


The most recent advances in artificial intelligence research are pretty staggering, thanks in part to the abundance of data available on the web. We’ve covered how deep learning is helping create self-teaching and highly accurate systems for tasks such as sentiment analysis and facial recognition, but there are also models that can solve geometry and algebra problems, predict whether a stack of dishes is likely to fall over and (from the team behind Google’s word2vec) understand entire paragraphs of text.

(Hat tip to frequent commenter Oneasum for pointing out all these projects.)

One of the more interesting projects is a system called LEVAN, which is short for Learn EVerything about ANything and was created by a group of researchers out of the Allen Institute for Artificial Intelligence and the University of Washington. One of them, Carlos Guestrin, is also co-founder and CEO of a data science startup called GraphLab. What’s really interesting about LEVAN is that it’s neither human-supervised nor unsupervised (like many deep learning systems), but what its creators call “webly supervised.”


What that means, essentially, is that LEVAN uses the web to learn everything it needs to know. It scours Google Books Ngrams to learn common phrases associated with a particular concept, then searches for those phrases in web image repositories such as Google Images, Bing and Flickr. For example, LEVAN now knows that “heavyweight boxing,” “boxing ring” and “ali boxing” are all part of the larger concept of “boxing,” and it knows what each one looks like.

More impressive still is that because LEVAN uses text and image references to teach itself concepts, it’s also able to learn when words or phrases mean the same thing. So while it might learn, for example, that “Mohandas Gandhi” and “Mahatma Gandhi” are both sub-concepts of “Gandhi,” it will also learn after analyzing enough images that they’re the same person.


So far, LEVAN has modeled 150 different concepts and more than 50,000 sub-concepts, and has annotated more than 10 million images with information about what’s in them and what’s happening in them. The project website lets you examine its findings for each concept and download the models.

According to a recent presentation by one of its creators, LEVAN was designed to run nicely on the Amazon Web Services cloud — yet another sign of how fast the AI space is moving. Computer science skills and math knowledge are one impediment to broadly accessible AI, but those can be addressed by SDKs, APIs, and other methods of abstracting complexity. However, training AI models can require a lot of computing power, something that is easily available to the likes of Facebook and Google but that for everyday users might need to be offloaded to the cloud.

  1. Steve Ardire Friday, May 23, 2014

    LEVAN looks interesting and appears ‘similar’ to NELL: Never-Ending Language Learning at CMU http://www.cmu.edu/homepage/computing/2010/fall/nell-computer-that-learns.shtml
    @cmunell I am a machine reading research project at Carnegie Mellon rtw.ml.cmu.edu

    >However, training AI models can require a lot of computing power, something that is easily available to the likes of Facebook and Google

    NELL is supported by the DARPA, Google, Yahoo.

  2. Is this intelligence or just plain smart?

    1. Fernando Olmos Sunday, May 25, 2014

      It’s intelligence using only one facet of our human abilities – to recognise patterns. We still do it much better than LEVAN or any other AI ever will.

      1. That would be my bet.

      2. Derrick Harris Tuesday, May 27, 2014

        True in many respects, but speed and scale are what make AI so compelling for certain things, especially classification. People can only remember so many names, faces, facts, etc.

  3. Lots of great computer vision projects going on! Can’t wait to see where we’ll be in a few years…

    Here’s another interesting paper to look at:


    (See the results in section 5.) Looks like the kind of result that will have loads of amazing applications (improving things across the board) — only time will tell.

    1. Interesting paper indeed! But there was a very good reason as to why Martens came up with Hessian-Free… Scaling down MNIST??? Well, interesting nevertheless, and it might be possible to apply ideas to methods that use approximation of Hessian. Hopefuly.

      Come join Google+ Deep Learning community – much better place to discuss such things:)

  4. Satish Sharma Saturday, May 24, 2014

    Well I am not impressed with the Gandhi example.

    Indira Gandhi wasn’t inspired by anything in her life — particularly anything written because she was fairly illiterate despite her fathers best efforts to send her to the best schools.

    And of course she WAS related to the other Gandhi .. having been quasi adopted by him — her father refused to walk her down the marriage plank — and the older man who had a penchant for young girls did .. well a computer will never know all this nuanced stuff for at least another 100 years .. so I am not impressed!

    1. abhineshwartomar Wednesday, July 9, 2014

      What is this non-sense!!!?

  5. Satish Sharma Saturday, May 24, 2014

    Well i am not impressed .. the example is rather poor

    Mrs. Gandhi was never impressed with anyones writings — being illiterate despite her fathers best effort to send her to good schools.

    And she was kind of related to the other Gandhi — he adopted her as her father refused to walk down the plank with her to marry her off — and the old man had penchant of hanging around with your girls — a computer will never know all this .. would it ? oh well not for another 100 years ..

  6. It is impressive to know that computers are now made to learn by itself.

    But do those people ever think what is the fine-line extent of such project.

    You know, I hate to say it but ….. we would not want to end up in the world of matrix…

  7. Vineeth Pulipati Sunday, May 25, 2014

    Isn’t this just like whatson?

  8. Rob Mac Hugh Sunday, May 25, 2014

    Pretty soon, the machine will say: “what’s that idiot who wants to borrow my circuits?
    Go upload yourself to a light switch pinhead!

  9. My main question is how do you teach it what is canon? There is so much non-authoritative data out there that humans can’t tell what’s true anymore.

    Still for basic hierarchical organizing of subjects/objects it’s an innovative start.

  10. Indira Gandi is not a wife of Mahatma Gandhi. LOL.

    1. No one is saying she was. You saw the pictures side-by-side and “Mrs. Gandhi” and came to an incorrect conclusion. You, as a faulty human, saw a pattern which does not exist. Likely a computer would not make the same mistake. :)

      1. Patrick McCormack, SPHR Tuesday, May 27, 2014

        Depends upon how it is programmed. When a human has every bit of information needed to make a sane and rationale decision, including a matrix of results for each possible decision, wrong decisions continue to be made. One must also consider that not all bad decisions are wrong, nor that all wrong decisions are bad. All it takes is one stray gamma ray or one heart string pluked to make a decision that is not the optimal choice.


Comments have been disabled for this post