Voice recognition is the future of video search,
but getting there won’t be easy.
By Nadine Krefetz
Engineering video playback for Alexa nd her friends might sound like an obscure task, far from the core re- quirements of a successful video pub- lisher, but in reality it’s an incredibly smart solution to the big search prob- lem. Viewers now have so many video services to choose from, it’s hard to
find which service has the video in question,
and developers are finding it incredibly challenging to create an elegant, efficient navigation for the user interface (UI).
Solving the Search Problem
Voice control enables user navigation via
speech instead of via a graphical user interface,
with the result being that users don’t have to
think about how to find their content. “Voice remote is a great way to flatten the UI. It gives an
awesome experience and is a way to get access
to what otherwise is a dizzying array of content
choices,” says Jonathan Palmatier, VP product
management, voice control, Comcast Cable.
Comcast’s X1 TV box has a voice remote, which
just might be the invention that will make people stop hating their cable companies (at least if
their company is Comcast).
Audio control alone is only part of the story.
When paired with AI (artificial intelligence),
software should be able to learn viewer prefer-
ences, tune to the correct channel or service,
and deliver increasingly appropriate search
results and recommendations over time. So in
the future, telling a device, “Play my favorite
TV show” should do just that—but we’re get-
ting a bit ahead of ourselves here. The road to
audio control will likely prove long and wind-
ing. The cast of characters includes Amazon’s
Alexa, Apple’s Siri, and Microsoft’s Cortana,
as well as Google Assistant and Comcast’s X1.
Near- vs. Far-Field Communications
The X1 and any remote you can speak to uses
near-field communication, a short-range connection standard for devices within a limited
distance. Alexa (via the Amazon Echo) and other always-on devices use far-field communication. “(Far-field devices) are always on and listening for a keyword to wake up and then start
recording and transmitting the voice command.
Our voice remote [X1] only works when a user
presses the microphone,” says Palmatier.
The difference between a voice remote search
and playing content via a far-field AI platform
can be a thin, moving line. Amazon’s Fire TV
remote is Alexa-enabled and can respond the
same way Alexa would on an Echo device, but
the vast majority of the Alexa controls are for
audio and connected home devices. One video playback control that’s available now is for
Plex, and Alexa can play Plex content if there
is a Plex server in a home media set-up.
Many of the voice platforms work well with
their own content (i.e., Alexa works best with
Amazon content) or when using a voice remote to play a movie. The problem develops
when a viewer wants to seek content from another media source or app, or even make a
more complicated request. Media apps need
to be designed with voice control to benefit
from the audio navigation available from the