Talk about your time savers. New research from Microsoft and the University of Toronto may make it possible for non-Chinese speakers to “speak” the language in their own voices without having to learn the language. Given the trade relationships between the US and China, this could be a really big deal if it works as advertised.
While great strides have been made in speech recognition over the past decades, the current systems still carry word error rates of 20 percent to 25 percent when handling “arbitrary speech,” Microsoft’s Richard Rashid wrote in a blog post. (Do you hear that Siri?)
But now, new technology called Deep Neural Networks, which mimics the way the human brain operates, enables much more discriminating speech recognition, according to Rashid, Microsoft’s chief research officer.
Rashid, who demonstrated the technology at a Microsoft conference in Tianjin, China in late October said the process takes text from the subject’s speech, runs it through a translator, first finding the Chinese (I’m assuming Mandarin) equivalents for his words, then rearranges the words in a way that is appropriate to the new language.
In addition, a text-to-speech system uses samples from a native Chinese speaker and from the English speaker’s own voice from pre-recorded English data to model the sound of the speaker’s voice. Watching a video of Rashid’s demo, it’s clear that the technology impressed the audience of Tianjin students. The system is not perfect, Rashid said, but it does cut word error rate by 30 percent.
Microsoft, by the way, last week inked a deal with 21Vianet, a Shanghai-based ISP, to bring both Windows Azure public cloud services and Office 365 to China.