Stay on Top of Enterprise Technology Trends
Get updates impacting your industry from our GigaOm Research Community
A Sunnyvale, California, startup called Orbeus has developed what could be the best application yet for letting everyday consumers benefit from advances in deep learning. It’s called PhotoTime and, yes, it’s yet another photo-tagging app. But it looks really promising and, more importantly, it isn’t focused on business uses like so many other recent deep-learning-based services, nor has it been acquired and dissolved into Dropbox or Twitter or Pinterest or Yahoo.
Deep learning, to anyone unfamiliar with the term, is essentially a term for a class of artificial intelligence algorithms that excel at learning the latent features of the data they analyze. The more data that deep learning systems have to train on, the better they perform. The field has made big strides in recent years, largely with regard to machine-perception workloads such as computer vision, speech recognition and language understanding.
(If you want to get a crash course in what deep learning is and why web companies are investing billion of dollars into it, come to Structure Data in March and watch my interview with Rob Fergus of Facebook Artificial Intelligence Research, as well as several other sessions.)
I am admittedly late to the game in writing about PhotoTime (it was released in November) because, well, I don’t often write about mobile apps. The people who follow this space for a living, though, also seemed impressed with it when they reviewed it back then. Orbeus, the company behind PhotoTime, launched in 2012 and its first product is a computer vision API called ReKognition. According to CEO Yi Li, it has already raised nearly $5 million in venture capital.
But I ran into the Orbeus team at a recent deep learning conference and was impressed with what they were demonstrating. As an app for tagging and searching photos, it appears very rich. It tags smartphone photos using dozens of different categories, including place, date, object and scene. It also recognizes faces — either by connecting to your social networks and matching contacts with people in the photos, or by building collections of photos including the same face and letting users label them manually.
You might search your smartphone, for example, for pictures of flowers you snapped in San Diego, or for pictures of John Smith at a wedding in Las Vegas in October 2013. I can’t vouch for its accuracy personally because the PhotoTime app for Android isn’t yet available, but I’ll give it the benefit of the doubt.
More impressive than the tagging features, though — and the thing that could really set it apart from other deep-learning-powered photo-tagging applications, including well-heeled ones such as Google+, Facebook and Flickr — is that PhotoTime actually indexes the album locally on users’ phones. Images are sent to the cloud, ran through Orbeus’s deep learning models, and then the metadata is sent back to your phone so you can search existing photos even without a network connection.
The company does have a fair amount of experience in the deep learning field, with several members, including research scientist Wei Xia, winning a couple categories at last year’s ImageNet object-recognition competition as part of a team from the National University of Singapore. Xia told me that while PhotoTime’s application servers run largely on Amazon Web Services, the company’s deep learning system resides on a homemade, liquid-cooled GPU cluster in the company’s headquarters.
Here’s what that looks like.
As I’ve written before, though, tagging photos is only part of the ideal photo-app experience, and there’s still work to do there no matter how nice the product functions. I’m still waiting for some photo application to perfect the curated photo album, something Disney Research is working on using another machine learning approach.
And while accuracy continues to improve for recognizing objects and faces, researchers are already hard at work applying deep learning to everything from recognizing the positions of our bodies to the sentiment implied by our photos.