This Buyers’ Guide article is not about choosing encoders—we cover encoding appliances in an- other Buyers’ Guide in this year’s Streaming
Media Sourcebook—but rather about understanding
a new breed of encoding solutions that leverage computer vision and machine learning to both reduce the
number of encodes necessary for smooth playback and
to make the overall encoding process more efficient.
This approach is typically abbreviated CAE, which
can either stand for content-aware encoding or context-aware encoding. And, while the concept of CAE
has been 20 years in the making, it seems to have finally gained enough traction—and demonstrated enough
real-world benefit, at least in initial testing—to warrant
closer attention for those considering various encoding options for 2018.
What Does CAE Replace?
The short answer is tons of guesswork and numer-
ous encoding cycles. Here’s why:
From the beginning of streaming, there has been
an understanding that not all content needed to be
encoded the same way: whether it’s sports content
versus talking-head content, or high-action content
using a handheld camera versus serene scenery foot-
age captured from a locked-down camera on a tripod,
content types are as varied as the cameras used to
In the early days, these differences were addressed
using various codecs: Indeo was a good general video
codec, Apple and Intel both had specific animation
codecs, and MPEG- 1 and MPEG- 2 were good for full-motion action. Encoding experts needed to choose
from a dozen or so codecs, deciding which struck the
best balance between quality and speed while also
keeping in mind the intended playback bandwidth
and key software or hardware players.
Even when the dust settled and the industry sorted out
the last few proprietary codecs from Microsoft and Real,
settling on the MPEG- and ITU-approved joint video codec (JVC), also known as MPEG- 4 Part 10 or H.264, the
use of a single-codec solution at a set bitrate wasn’t necessarily the right choice if one wanted to avoid buffering
on intermittent cellular data or even Wi-Fi networks.
Understanding the context
of content for better encodes
By Tim Siglin