Blog Post

Bye bye Codecademy, MIT shows off a way to program using natural language

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

What if you could learn to code just by learning a few commands that match the way we speak or write?

Researchers at the Massachusetts Institute for Technology have shown off that for a few tasks, such as tweaking word processing documents and spreadsheets, people could use natural language as opposed to specific programming languages. As we spend more time in our digital worlds, making the manipulation of that world easier for everyone is the goal behind several startups such as IFTTT Codecademy or even ARB Labs, and is an essential ingredient for further breakthroughs.

The researchers in the MIT Computer Science and Artificial Intelligence Laboratory demonstrated their findings using productivity software, but their methods might also work for other programming tasks. While it’s not exactly clear from the MIT release how this will work in practice, it’s awesome that such research is even happening. Giving more people the power to code would be an amazingly huge source of innovation and disruption, because it will let even more people build things online and possibly in the real world.

While many people credit Amazon Web Services (s amzn) for lowering the cost of building a startup, I think they also underestimate the benefits of easier programming languages and frameworks such as Ruby, PHP and Python, which have grown in popularity and allowed more people to build apps they once would have struggled with in C or Java. Part of the reason these languages are so popular is because they’re easier to learn, and the more coders there are, the more apps get developed.

So how did MIT work its magic? Regina Barzilay, an associate professor of computer science and electrical engineering explains the two primary insights. One is essentially translating computation tasks into set formalized language. Yet because people might use many variations to describe that task, the researchers used a graph structure to map out the relationships between the natural language ask so the computer could understand the many ways it might be asked to perform a task. From the MIT release:

What [Nate] Kushman and Barzilay determined, however, is that any regular expression has an equivalent that does map nicely to natural language — although it may not be very succinct or, for a programmer, very intuitive. Moreover, using a mathematical construct known as a graph, it’s possible to represent all equivalent versions of a regular expression at once. Kushman and Barzilay’s system thus has to learn only one straightforward way of mapping natural language to symbols; then it can use the graph to find a more succinct version of the same expression.

The second is a bit more complicated. The research team built a system that automatically learned how to handle data stored in different file formats such as .pdf or .doc files based on specifications prepared for a popular programming competition. Essentially the team built a systems that can use natural language to build input parsers. Input parsers figure out which parts of a file contain which types of data: Without an input parser, a file is just a random string of zeroes and ones.

So while people won’t be writing apps anytime soon using natural language, the research at MIT and efforts of startups such as IFTTT are crucial to helping us get more people manipulating the digital morass that we interact with daily. And that’s only going to empower more people to innovate.

6 Responses to “Bye bye Codecademy, MIT shows off a way to program using natural language”

  1. anisotropic

    I can’t help but think a natural language programming language would help make things easy for simple tasks. Like writing a script that moves every even numbered .jpg into a new folder and resaves it or something… But lets say you have a problem like programming whip physics in a 3d video game, or use footage from this camera feed to navigate flying around an environment with an ornithopter….

    It sounds like trying to do anything complicated, the natural language part would just get in the way… and also make for code that’s harder to understand.

    I hate to say it, but sometimes less obfuscation is better when programming. Also, human language has quite a bit of entropy, there’s so many ways to phrase things…. the more you set a rigid structure of grammatical conventions, the further away you go from natural language….

    I applaud the effort to make programming easier to the layman, and the positive intentions behind that… I just don’t envision people approaching difficult programming tasks in something like this any time soon.

  2. Terrence Andrew Davis

    I understood 286 DOS interrupts with the PICs. Good luck with PCI for a complete USB implementation handling OHCI,EHCI,UHCI, ICH4,5,6,7,8,9,10,11, It should handle sharing on the interrupt — there are ABCD for each bus that much be shared by 32 functions.

  3. Terrence Andrew Davis

    Object C is supposed to eliminate Malloc Free. It never works. You just double the amount of stuff you have to learn. It’s up to 8 times now. Pointers are simple. We all learned 6502 asm. We all knew pointers. A language comes along and tries to hide pointers. It doesn’t. they double it. double it again.