When encoding an adaptive bitrate lad- der, oftentimes you have to compare videos with different resolutions, which
raises multiple issues. For example, when measuring peak signal-to-noise ratio (PSNR) or video multimethod assessment fusion (VMAF) to
compare 640x360 video against an 854x480 video, what resolution do you compare them at?
And how do you interpret the PSNR or VMAF
scoring, and which metric is best? In this column, I’ll tackle all of these issues.
Regarding the first issue, there’s a theoretically correct answer, and then there’s how it’s
generally done, and they don’t always correspond. The theoretically correct answer is to
compare at the resolution at which the video
will be viewed. For example, if you knew for
certain that the video was going to be watched
in a 480p window, you should scale the source
and output files to 480p as needed and run your
comparisons there. However, few publishers
have that degree of certainty, so most scale
the encoded files up to the resolution of the
source video and compare there. This certainly makes sense for over-the-top (OTT) providers whose videos are almost always watched at
full screen, and is a nice compromise position
for other publishers.
Some programs handle this scaling behind
the scenes; for most others, you have to scale
in FFmpeg, which is a royal pain from a time
and disk-space perspective. My one tip here is
to convert your encoded files to the Y4M container format, rather than YUV, because the
Y4M header contains resolution, frame rate,
and format information that simplifies comparisons in your quality control tool. If you use
the YUV container format, you’ll have to insert
resolution, frame rate, or format data into your
command line or input it into the program itself, which can be time-consuming.
The second question is how to interpret the
scores once you have them. If you’re compar-
ing cross-resolution files to the source, under-
stand that scores will drop at lower resolutions
because the smaller files contain more scaling
artifacts and loss of detail. This means files en-
coded at the source resolution will have the
highest scores, with lower resolutions scoring
For example, in another article I wrote for this
issue on per-title encoding, I compared technologies using an encoding ladder that started at
1080p and dropped to 180p. The typical PSNR
scores were 45–50 dB for the 1080p rung, and
dropped to around 30 dB for the lowest rung.
That’s not a lot of range. The rule of thumb for
PSNR is that quality above 45 dB is typically not
perceivable by the viewer, while scores below
35 typically presage visible artifacts. But that’s
only for the 1080p rung; the 180p rung will never get close to 45 dB, although the files might
look good at 32 dB. So you can’t predict how a
human would perceive a 360p file with a PSNR
score of 38 dB, although when you’re comparing
cross-resolution results, higher is always better.
What’s great about VMAF is that it was designed for this type of cross-resolution analysis.
Specifically, a score of 100 is mapped to a 1080p
file encoded at a constant rate factor (CRF) of 22,
while a score of 20 is mapped to a file encoded at
240p at a CRF value of 28. In the same per-title
analysis, typical 1080p scores were in the mid-to upper 90s, while the 180p files often scored
in the single digits.
This range made VMAF scores much easier
to interpret than PSNR, but you still can’t predict how a viewer will perceive the quality of a
clip in the middle, say a 480p clip with a VMAF
score of 42. However, you do know that six VMAF
points equals one just-noticeable difference
(JND). Technically, this means that 75% of viewers would notice a six-point swing, while closer
to 90% would notice a 12-point, two-JND swing.
The ability to identify a JND is exceptionally
useful to a range of encoding decisions, from
configuring your encoding ladder to choosing
an encoder or a codec. If you haven’t already
started working with VMAF, it’s time to try it.
Quality Metrics Up and
Down the Encoding Ladder
Jan Ozer ( email@example.com) is a streaming
media producer and consultant, a frequent contributor to
industry magazines and websites on streaming-related topics,
and the author of Video Encoding by the Numbers. He blogs
frequently at streaminglearningcenter.com.
Comments? Email us at firstname.lastname@example.org, or check
the masthead for other ways to contact us.