Rumor has it Apple is in talks with Nuance to strike a deal allowing Apple to use Nuance’s speech recognition software. One of the uses mentioned was the possibility of using the software in the new NC data center, but I don’t think it’s likely that the data center is the intended target.
The main reason relying on a voice recognition server in a data center is a bad idea is the fact that we’re talking about mobile devices. You can’t guarantee a persistent data connection on a smartphone, so whenever you don’t have a signal, you can’t use the voice recognition.
The current implementation of speech recognition on the iPhone, Voice Control, doesn’t have this problem. Since the processing is all done on-device, you can use Voice Control whenever and wherever you like; there’s no need to have any connection to anything. Of course, Voice Control doesn’t integrate with third-party apps, but that is possibly part of what the rumored deal is about – expanding current on-device features.
Even if you do have an internet connection, there are issues with housing the tech remotely. Firstly, there’s connection speed. A 3G or Wi-Fi connection would be likely fine, but sending your audio over a GPRS or EDGE connection will take some time. If you are using voice commands to do something such as launch an app, for example, it is likely to be faster to navigate to the app yourself and launch it, rather than waiting for the speech to be sent, processed and returned. Even if you are using a hands-free system to send an SMS in the car, it might be faster to find somewhere to pull over and type the message yourself.
Another problem is data capping. If you are a frequent user of the speech recognition system, sending audio over a data connection too often will soon rack up overage charges. A possible solution to this could be compression of the audio before it gets sent to the server, but the more it gets compressed, the lower the quality, and the lower the success rate of the recognition software.
Finally, there’s the strain on mobile carriers, since all the sound files have to sent over their data networks. This extra traffic would be on top of everything that’s being sent now, so the introduction of server-based speech recognition will likely impact data transfer speeds and/or bandwidth costs.
It seems to me as if implementing the software in the data center isn’t the best way of tackling this. If Apple is indeed negotiating with Nuance, then I think it’s likely going to be for either improving the current Voice Control feature, or for adding support for app developers to integrate their app with system-wide Nuance tech; imagine being able to control the Twitter app using voice commands, for example.
Some extremely advanced capabilities may not be possible on-device, but for the average user, basic control and dictation features would be enough. There would be no need for advanced features such as a vocabulary editor or word-by-word training, as offered in some of Nuance’s desktop voice recognition products.