Microsoft’s new Xbox One has many new features, but one in particular raised some eyebrows Tuesday: The new game console will always be on, and users will be able to launch games, live TV or even a Skype call with simple voice commands, and without ever picking up a controller or remote control. Does that mean, as the Verge mused, that Microsoft will always be listening to each and every word spoken in your living room?
The answer is yes, no, and better get used to it. Microsoft hasn’t actually said how many aspects of the Xbox One are going to work, but the demo it gave at its campus in Redmond, Wash. Tuesday contained some solid hints on the particulars of its voice control. To wake up the device and launch live TV, play a game or do anything at all with it, users will first have to say “Xbox on.”
That’s what people who work on speech recognition call “hot words” – easily recognizable phrases that can be detected by a system without too much effort. Once a user says that magic word or phrase, the actual speech recognition kicks into high gear.
That means that the Xbox One continuously listens for someone to say “Xbox on,” and that everything else that’s spoken is automatically disregarded. Listening for these hot words is done locally and doesn’t require much in terms of system resources. For example, there’s no need to record anything, since all that matters are the hot words. But once those words are uttered, the Xbox One is going to use advanced speech recognition to figure out what users are actually talking about.
Again, Microsoft hasn’t said exactly how this is going to work, but a spokesperson told me that some of the personalization offered by the device is “one of the benefits of Xbox One being connected to and powered by the cloud.” I’d expect that the same is true for speech recognition, much in the same way that Google uploads everything you say to its servers when you use voice search on your Android phone.
The use of hot wording to wake up technology from a state of low-level listening to launch active speech recognition isn’t new. It’s also at work in Google Glass, where users get the device’s attention by saying “okay glass.” Google Now simply uses “Google” as a hot word to launch voice input. And the Xbox 360 starts to accept voice commands once users yell “Xbox” at the device’s Kinect sensor.
The difference between how the Xbox 360 and the Xbox One approach voice recognition isn’t so much about technology, even though Xbox users probably hope that the new iteration is going to work better. What makes people feel uncomfortable is that the Xbox One, and with it its microphone, are meant to be always on.
However, the always-on microphone of the Xbox One is just a sign of things to come. Voice input is going to become a key component of a growing number of internet-connected devices and appliances in your home, car and office, and many of them will use hot words to switch from low-level listening to active speech recognition.
In fact, you are likely looking at one of those devices right now: Laptops, tablets and mobile phones all contain microphones, and they’re all waiting to become hot words-aware any day now. Google just demonstrated how it is going to add hot wording to search on the desktop at last week’s Google I/O conference, allowing users to start a voice search query by simply saying “okay Google” without touching a single button.
Of course, all of this doesn’t mean that there are no privacy issues around hot wording and always-on microphones. Companies should make it clear how exactly they’re using the technology as it is becoming more widely distributed, and there should always be a way to opt out and rely on alternative input methods. It may also be a good idea to indicate to users when exactly a device is reverting back from active speech recognition to a state of passive listening. But I’d expect that most consumers quickly get used to the constantly running mic, always listening for those magic words.