Blog Post

How deep learning can teach computers Spanish without a tutor

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

In August, Google open sourced a tool called word2vec that lets developers and data scientists experiment with language-based deep learning models. Now, the company has published a research paper showing off another use for the technology — automatically detecting the similarities between different languages to create, for example, more accurate dictionaries.

The method works by analyzing how words are used in different languages and representing those relationships as vectors on a two-dimensional graph. Obviously, a computer doesn’t need a visualization to understand the results of the computations, but this one from the paper is instructive in showing the general idea of what the technique does.


Here’s how authors Tomas Mikolov, Quoc V. Le and Ilya Sutskever describe the concept and the chart:

“In Figure 1, we visualize the vectors for numbers and animals in English and Spanish, and it can be easily seen that these concepts have similar geometric arrangements. The reason is that as all common languages share concepts that are grounded in the real world (such as that cat is an animal smaller than a dog), there is often a strong similarity between the vector spaces. The similarity of geometric arrangments in vector spaces is the key reason why our method works well.”

The actual techniques they used were the Skip-gram and Continuous Bag of Words models, which are the same ones exposed by word2vec. The authors describe them thusly:

“The training objective of the CBOW model is to combine the representations of surrounding words to predict the word in the middle. … Similarly, in the Skip-gram model, the training objective is to learn word vector representations that are good at predicting its context in the same sentence. … It is unlike traditional neural network based language models … where the objective is to predict the next word given the context of several preceding words.”

Here’s how I explained their general functionality when covering the word2vec release:

“Its creators have shown how it can recognize the similarities among words (e.g., the countries in Europe) as well as how they’re related to other words (e.g., countries and capitals). It’s able to decipher analogical relationships (e.g., short is to shortest as big is to biggest), word classes (e.g., carnivore and cormorant both relate to animals) and “linguistic regularities” (e.g., “vector(‘king’) – vector(‘man’) + vector(‘woman’) is close to vector(‘queen’)).”

You can see the power of the translation application of these models even when they’re not entirely accurate. One example they note in translating words from Spanish to English is “imperio.” The dictionary entry is “empire,” but the Google system suggested conceptually similar words: “dictatorship,” “imperialism” and “tyranny.” Even if the model can’t replace a dictionary (in fact, the authors note, dictionary entries for English to Czech translations were as accurate or more accurate 85 percent of the time), it could certainly act as a thesaurus or understand the general theme of a foreign text.


There are clear implications to this type of research for Google, which wants to make searchable and understandable the vast amount of data (search, web pages, photos, YouTube videos, etc.) it’s collecting, and also is banking on speech recognition as major point of distinction for its mobile device business. I think you can see some of this work paying off in the new search algorithms and features Google announced on Thursday. AlchemyAPI Founder and CEO Elliot Turner noted to me recently that the same vector representations Google is using on text could also be used on photos and videos, theoretically categorizing them based on the similarity of their content.

Google isn’t the only company working on new deep learning techniques or applications, either. Companies such as Ersatz and the aforementioned AlchemyAPI are exposing the technology as commercial products, and web companies like Baidu and Microsoft are hard at work on their own research efforts.

2 Responses to “How deep learning can teach computers Spanish without a tutor”

  1. Oneasasum

    And here’s something really neat: there are some new papers to be presented at the upcoming NIPS conference that will take things much further — no about language translation per se; but, more generally, about what you can get out of vector space embeddings of entities.


    This paper appears to be an extension of this one:

    The old record was about 76%; the new record is near 90%! The kind of thing this can be used to do is to fill in missing relations in Knowlege Graphs; or verify existing relations — now with much greater accuracy. Imagine: when IBM’s Watson answered a question incorrectly, thinking Toronto was a U.S. city — what if it were given common sense, and allowed to ascertain how likely (Toronto, iscityof, US) is a valid relation? It would not have missed that question (answer). Now imagine what it would mean if question-answering systems, in general, were given common sense reasoning capability. This new result goes a some ways towards addressing that.


    This paper explains how they have greatly improved the continuous skip-gram algorithm, so that it gives much better vector representations, and can even handle phrases like “Air Canada” (whose meaning can’t be discerned just from the words “air” and “Canada”).

  2. Dave Sullivan

    The work that’s being done with language models right now really is a breakthrough of its own. We’ve known that we can build models where the word “one” and “two” automatically learn that they are pretty close to each other in “vector space”–but now it turns out that you can use other languages to flesh out those similarities and differences even more.

    There are a lot of directions this can go–people might take learned object representations (say, by watching youtube videos) and find that they can be combined with those language models to improve language processing and vice versa (text data helps improve vision performance).

    In 2013, deep learning has started entering the limelight but it’s still a very abstract thing for most people. The “killer app” of deep learning doesn’t exist yet, but I’d be very surprised if it doesn’t make use of these types of concepts and involve some non-trivial NLP.