The text-based web of the last 25 years is evolving into an ever more visual web where pictures and videos dominate. Just look at how many apps there are to make movies or beautiful slideshows of our personal photos, or the success of sites like Pinterest or Snapchat. After all, we have the means to consume richer files thanks to abundant bandwidth and create them now that so many people own video cameras in the form of a smartphone.
But unlike the text-based web, searching the visual web is tough. That’s one reason why so many companies are working on computer vision. But in the meantime, there’s an excellent way to categorize videos and make them searchable — use audio. That’s what startup Op3nVoice is trying to do with an API it plans to launch in two weeks.
The Austin, Texas-based company is trying to become the Google of searching audio files and videos by allowing companies to access its platform, which first ingests video and audio files and then makes those files searchable. Google has similar technology is used to offer closed-captioning of YouTube videos. It’s pretty powerful stuff, allowing sites that host videos to do away with tags and directories and just let people search the terms referenced in the video. You can see the results at a site called Mobento, which has implemented the Op3nvoice product for its educational video service. Not only can you search for terms, but the results show where those terms occur in the video playback (see below).
That’s good news for a video site, but it’s useful for podcasts or even folks who archive telephone calls. In fact, that’s how the company got started back in 2011 in London — it was a consumer-facing service for people who wanted to record and store their phone calls. “One of the obvious things people wanted to do with those calls was search.” said Paul Murphy, Op3nvoice CEO and co-founder. So the company started building a search function on top of its own natural language processing engine and custom database.
The product sold to financial institutions like banks that wanted to track compliance issues and insurance firms that wanted to track potential problems in the interviews recorded by claims adjusters. But every new client meant hiring new engineers, which limited growth and scale.
So last year, after joining the initial Techstars London class, Op3nvoice realized that it needed to switch up its business if it wanted to build something that could scale and move to the U.S. to get the investment it needed to build that scalable product.
So now Op3nvoice is in Austin, all set to open its API and seeking to raise $1.7 million. Op3nvoice will charge between $2 and $3 per hour for the digital signal processing that has to occur when the video or audio files are ingested. The search function is free up to a certain number of searches.
As a journalist who records all kinds of audio, this seemed powerful, but not powerful enough. Could auto-generated written transcriptions be far behind decent audio search? Murphy disabused me of my hopes noting that transcription in a phone call is still about 75 percent accurate and 95 percent accurate in a lecture situation, which isn’t good enough yet for professional transcription. However, it’s plenty good enough for search, which should help developers build an amazing array of products from better ad matching on videos to tagging for podcasts.
Murphy says the company is also working with a German lab to detect emotion in audio via the voice waveform, which could then be applied to track everything from movie genres to determine what call center workers tend to lose their cool. For those of us who tend to speak our voicemails, it might also someday offer clues to how the text should be read. But that’s getting ahead of ourselves.