Expand
Gigaom Gigaom Logo Skip Navigation
  • Newsletter
Become a Client
  • Contact
  • Sign in
  • Subscribe
Gigaom Logo
Skip Navigation
  • Newsletter
Become a Client
  • Blog
  • Analysts
  • Webinars
  • Research
  • Podcasts
  • Contact
  • Sign in
  • Subscribe
  • Blog
  • Analysts
  • Webinars
  • Research
  • Podcasts
  • Contact
  • Sign in
  • Subscribe
  • Cloud Infrastructure
  • Analytics, BI & ML
  • Dev & Ops
  • Artificial Intelligence
  • Security & Risk
  • GigaBrief
Stay on top of emerging trends impacting your industry with updates from our GigaOm Research Community Join Research Community

Blog Post

Microsoft Azure now uses speech recognition to help users search videos

Derrick Harris Sep 10, 2014 - 11:50 AM CDT
  • Analytics, BI & ML
  • Cloud Infrastructure
  • Media
  • Video
  • Tweet
  • Share
  • Post

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Microsoft is putting its speech-recognition expertise into action on its Azure cloud platform with a new service that lets users index and search their audio and visual files based on the words that are spoken in them. The new service, called the Microsoft Azure Media Services Indexer, is the materialization of a Microsoft Research project called MAVIS.

The way the indexer works is to listen to a user’s content and extract keywords as metadata, which can then be used for a variety of things. Search is the probably the most obvious one, but the metadata could also be used to categorize content or, [company]Microsoft[/company] claims, add descriptions or captions to it. This will help people discover content and gain a sense of what’s in it, but will also help content creators bring some order to their digital libraries and possibly make more money off of them as they start matching ads to keywords and concepts.

While the resulting indexes aren’t particularly high-tech as far as database applications go, the speech recognition capabilities are based on deep learning — the same set of techniques that power the upcoming real-time translation feature in Microsoft’s Skype application. Assuming the Azure indexing service is English-only right now, Microsoft’s work in translating languages would seem to support the idea of it expanding across languages at some point.

An annotated screenshot of the old MAVIS interface. Credit: Microsoft
An annotated screenshot of the old MAVIS interface. Credit: Microsoft

It also wouldn’t be surprising to see these capabilities come to Bing, if they’re not there in some capacity already. [company]Google[/company] has done some similar, although more limited, applications of speech recognition to video in the past. In 2008, for example, it indexed politicians’ YouTube videos so viewers could search their speeches by keyword. Currently, YouTube users can also use the service to add automated captions to their videos.

It seems pretty clear, though, that commercial speech-recognition services are just the first step in the quest for companies such as Microsoft, Google, Facebook and — clearly — Baidu to help users navigate through the rich media they’re creating and consuming. Computer vision has received a lot of attention recently as companies ramp up their efforts to recognize what appears in photos and videos (try, for example, searching your unlabeled Google+ photos by keyword, or using the product-recognition feature on the Amazon Fire phone), and in some cases actually piecing together video frames to create short visual summaries or highlight reels.

When you consider how much audio and visual content we’re producing, it becomes really easy to understand why speech recognition, computer vision and language understanding, and the techniques for achieving them, are such hot topics right now. The web — and even corporate servers — isn’t just full of text pages anymore, and there’s only so much we can rely on manually created metadata as we’re swimming in YouTube clips, Dropcam footage, Netflix movies, Flickr photos, surveillance tapes and a whole sea of other unlabeled or poorly labeled content.

Companies in the business of delivering content, or even just information, stand to make a lot of money if they’re able to help consumers or businesses wade through it all and find what they need.

Advertisement
  • Tweet
  • Share
  • Post
  • artificial intelligence
  • big data
  • deep learning
  • metadata
  • Microsoft
  • speech recognition
  • video data
Advertisement
Advertisement

More Posts

Cloud computing concept of cloud services icon with internet data center room

Blog

What’s New at DRYiCE? How Do They Fit In the Emerging Multi-Cloud Enterprise?

David S. Linthicum Jan 18, 2021 - 10:51 AM CST
Corridor Of  Server Room With Server Racks In Datacenter. 3D Ill

Blog

NVMe/TCP: Bridging Traditional and Next-Generation Data Center Architectures

Enrico Signoretti Jan 7, 2021 - 1:59 PM CST
Sponsored by
Abstract Technology Binary code with hexagons Background.Digital binary data and Secure Data Concept

Blog

NVMe-oF for The Rest of Us

Why NVMe/TCP is the new iSCSI
Enrico Signoretti Dec 18, 2020 - 8:03 AM CST
Sponsored by
Advertisement

Related

Data Analysis Process Concept as a Art

Key Criteria/Market Landscape

Key Criteria for Evaluating Network Operating Systems

An Evaluation Guide for Technology Decision Makers
Chris Grundemann Jan 22, 2021 - 2:15 PM CST
Shopping cart symbol futuristic sketch

Live Webinar

The Role of Cloud Data Storage in Second Generation Multi-Cloud Strategy

Mar 11, 2021 - 12:00 PM CST
Register
Sponsored by
040 – DL Observability featured

Podcast Episode

Voices in Innovation – David Linthicum on the Key Criteria for Evaluating Observability

Johnny Baltisberger
Listen
Advertisement

Podcasts

Podcast

Voices in AI

Byron Reese
  • iTunes
  • Google Play
  • Spotify
  • Stitcher
  • RSS
Listen
voices-in-data-storage-cover

Podcast

Voices in Data Storage

Enrico Signoretti
  • iTunes
  • Google Play
  • Spotify
  • Stitcher
  • RSS
Listen

More Podcasts

Advertisement
  • Blog
  • Analysts
  • Webinars
  • Research
  • Podcasts
Gigaom
  • About
  • Contact
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Twitter
  • Facebook
  • LinkedIn
  • RSS Feed
  • Newsletter
© 2021 GigaOm All Rights Reserved.
This website uses cookies; by continuing you are a agreeing to our Privacy Policy Accept
Privacy & Cookies Policy
Necessary
Always Enabled

This is an necessary category.

Save & Accept