When Word Lens — a futuristic, augmented reality, translation app — launched for the iPhone just before Christmas, there was plenty at stake: co-founder Otavio Good had spent plenty of money and time on development.
“I started to write the program from scratch two and a half years ago,” he said in an interview with me. “It was just another idea when I had it [but] I told it to my friends — and as I did, I started thinking it through more. I put together a prototype in about 3 weeks, and that made me decide it was time to quit my job and work on it full-time.”
Good and his colleague John DeWeese didn’t really know how their app — which instantly translates words it detects using your iPhone’s camera — would be received. In the end, they needn’t have worried; prompted in no small part by a superb demo video, the program turned out to be a bumper Christmas gift for the duo.
“The response has been overwhelming,” says Good. “Within about a day and half of releasing it, we had a million views on our YouTube video. I couldn’t believe how quickly people saw it and told other people about it.”
While Good is cagey about sales, he’s clear that it’s given the team a significant return in just a few days. “I can say that within the first few days, I had made back the money I put into the company,” he says. “So I think I can safely call it a success so far.”
So how has a tiny development team been able to do something that requires serious commitment from technology giants like Google? Interestingly, Good built everything from the bottom up, using his experiences as a graphics wizard at San Francisco games studio Secret Level to influence his code. That background doesn’t mean it was easy, though.
“Word Lens was the most technically challenging project I’ve ever worked on,” he says. “It needs to be able to read things in the real world, in real-time, on an underpowered device; most optical character recognition programs are meant to read scanned documents, not in real-time, on computers that are more powerful than phones.”
So how does it work? It turns out it’s a three-step process that starts with trying to discern where the words might be in an image, then trying to read them (much like a scanner can read text).
Once that’s done, translation takes place: the part of the process that, perhaps surprisingly, has brought the app a lot of criticism. GigaOM’s own Ryan Kim called it “hit and miss.” Good suggests that people expecting complete grammatical conversions are getting the wrong impression.
“For the most part, Word Lens does a word-for-word translation, with some exceptions for things like ‘por favor’ translating to ‘please,’” says Good. “Word-for-word translations don’t usually look very pretty, but they can get the point across. The goal for Word Lens was not to be perfect; the goal was to make a useful tool for tourists. Sometimes it’s important to make that distinction, because a lot of people have the expectation that the translation will be perfect, and they won’t even have to do any thinking.”
There have been other criticisms, too: notably, the pricing model. The app itself is free, but it simply demonstrates the system. Doing any actual translation requires buying a separate
$4.99 $9.99 one-way translation dictionary for each language. Good understands that some users are confused or irritated by this model, but he stands by it.
“We did that so that people can try out the technology before they spend money on it. I think that’s important, because the technology isn’t perfect, and we want people to understand what it is. Unfortunately, some people think of this as a ‘bait and switch.’”
So what’s next? Interface fixes, improved translations, more devices and — crucially — more languages.
Users waiting for Mandarin may have to wait a while, though: “The Latin alphabet,” he suggests, “tends to be easier for the computer to read than things like Arabic or Chinese.”
Related research from GigaOM Pro (subscription req’d):