We‘ve been promised that in the future we will be able to simply tell our devices what we want. They will understand and will comply with our interests. And yeah, there are voice recognition systems, but did you notice how poorly they work? Now scientists from the University of Waterloo have found a new way to achieve the most natural speech-based interactions with TVs to date.
The situation is not too bad. We can talk to our phones and they usually understand us pretty well. There also these home assistant devices that have great voice recognition capabilities. And yet we cannot tell our TV’s to change the channel. How come? There are attempts to fix this bizarre mistake in our electronics. Comcast’s Xfinity X1 has something that’s called a ‘voice remote’ that accepts spoken queries. It allows users to tell the TV to change the channel, to present the weather forecast, TV programme, free kid’s movies and so on. But scientists had to go an extra mile to make it work.
Scientists decided to employ the latest and greatest AI technology and a technique known as hierarchical recurrent neural networks. The goal was to enable the system to interpret the context better to allow for a better accuracy. Previous systems had trouble with about 8 % of the queries, but the new model understands pretty much everything and is able to answer appropriately. That will greatly enhance the user experience and will help those struggling to use a regular remote. Jimmy Lin, one of the scientists in this project, explained: “If a viewer asks for ‘Chicago Fire,’ which refers to both a drama series and a soccer team, the system is able to decipher what you really want. What’s special about this approach is that we take advantage of context – such as previously watched shows and favourite channels – to personalize results, thereby increasing accuracy”.
It is always difficult for machines to understand what people are saying. There are a lot of levels to this technology. The microphone has to pick up the sound, then the algorithm has to interpret different words, which are not always easy to understand due to our accents. Finally, AI system has to search for a meaning in those words and interpret it as a form of a command that it has to fulfil. Now scientists will begin to work on an even more complex technology, which will analyse the words from multiple perspectives, which should further enhance the understanding of what the user might be saying.
In the future we will be controlling our TV’s by just saying what we want them to do. Hopefully, that future is near and frustration-free.
Source: University of Waterloo