percentage of the file was downloaded, on the assumption that the file would download faster than it
was played back. This meant the initial wait for the
beginning of the file to download gave the patient
end user a “head start” in playback that would, ideally, be devoid of buffering.
The third way, true streaming of on-demand content, required hefty hardware assistance in further
compressing video content into data rates more
akin to dial-up modem speeds than to CD-ROM
data rates. To be honest, though, hardware assistance was needed for almost every early video codec, regardless of the data rate, resolution, or even
In fact, the only codec in the first year of streaming media (the 1997–1998 timeframe) that had a
software-only encoding option was MPEG- 1 and its
audio sidekick, the MP3 format. MP3, which has just
reached the out-of-patent-protection stage after 20
years, was called that because it was the third audio coding format in the MPEG- 1 video and audio
standard ratified by the Moving Picture Experts
As more powerful general processors—whether
general-purpose processors (GPPs) or central processing units (CPUs)—emerged, bringing with them
an opportunity to move the next generation of video (MPEG- 2) from hardware-only to software-only
encoding solutions, there was often a sacrifice to be
made in using software-only encoding: the final output often reduced the frame rates from 24–30 frames
per second down to 10–15 fps to lower the overall encoding session length.
Part of the reason these software-only options
didn’t cut it was due to the additional need to interweave the MP3 audio and MPEG- 2 video elementary streams together in to a multiplexed stream suitable for transmission. This multiplexed, or muxed,
file was often referred to by the acronym M2TS, for
MPEG- 2 transport stream.
This MPEG- 2 transport stream technology, now
more than 20 years old, still forms the basis for the
majority of Apple HTTP Live Streaming (HLS), using a newer codec that replaced the MPEG- 2 video
codec: H.264, also known as MPEG- 4 Part 10 or as
Advanced Video Coding (AVC).
Why Hardware Is Still Necessary
HLS allows “streaming” delivery of a series of small
files at a predefined data rate that are downloaded,
assembled into a playlist for back-to-back playback,
and then played in sequence until no more files
(known also as “fragments,” “segments,” or “chunks”
of data) are available to play.
The lack of additional files for playback either sig-
nifies the end of the video being played back, which
can be confirmed by comparing the segment number
with a predefined numbering sequence in a manifest
file, or it signals a need to send subsequent segments
at a lower data rate, since the player is not receiving
current data rate segments in a timely enough man-
ner for proper playback.
In other words, this version of streaming, where the
segments are transmitted in a near-real-time fash-
ion, is more akin to the early file download approach,
although in practice it acts more like a progressive
download, since HLS won’t start playback until at least
three segments have been downloaded as a “head
start” for continuous playback.
As a result, on-demand content needs to be trans-
formed in faster-than-real-time encoding sessions,
where an hour of prerecorded content may need
to be converted in less than 30 minutes, or roughly
twice the speed of real time. This allows for vari-
ous additional steps to be performed, such as setting
ad-insertion points and flags, as well as encoding
multiple channels of alternate audio (i.e., overdubs
in a foreign language).
Real-time delivery of live content, though, has to
move beyond the multiple-segment scenarios of HLS
and its MPEG-ratified equivalent, MPEG-DASH, or
Dynamic Adaptive Streaming over HTTP.
Even 20 years into the streaming revolution, here
in 2018, streaming live video in a low-latency scenario
often requires hardware assistance, both for conformance to old-school, real-time streaming protocols
(RTP and the more secure RTSP) and for a way to minimize delay or latency.
Inherently, the average frame is 1/25 or 1/30 of a
second, so about 33. 3 milliseconds (ms) on average.
If several frames are needed to analyze changes between frames (aka temporal changes; see our Buyers’ Guide on content- and context-aware encoding),
then an additional inherent latency is introduced,
since the encoder will need approximately 100–300
ms worth of video content to accurately assess temporal changes.
In addition to the latency around the frames themselves, content needs to be encoded and then packetized for delivery. In some instances, where content is
already coming from an IP camera, the content needs
to be repackaged so the IP stream that’s pushed to
the encoder or transcoder will introduce an additional one- to two-frame delay.
CAST, which develops and sells semiconductors,
summed up the latency issue as both a human-perception and a machine-interaction dilemma.
“When humans interact with video in a live video
conference or when playing a game, latency lower
than 100ms is considered to be low, because most