But AI needs human guidance. For example, a “
confidence level” is applied to image assessment, and anything that falls below a certain score is sent to a human
team for evaluation. “Accuracy is a function of many
factors including, (but not limited to), resolution of the
video (i.e. HD vs. SD), the noise level in the video (e.g.
a lot of background noise can affect the accuracy of
transcription), and the amount of motion in the video (fast-moving videos are tougher if the resolution is
low),” says Milan Gada, principal program manager,
video AI cloud services, Microsoft. With the right combination of factors, he says, it’s possible to achieve accuracy in the high 90% range.
Cloud scalability also plays a role in AI. Users are
now able to access speed and processing power that
wasn’t available a few years ago. “If you have 5,000 videos, each 1 hour long, and if it takes 1 hour to process
each video, you could use 5,000 machines and have all
your videos processed in an hour,” says Gada. Microsoft
has two main AI products—Video Indexer and Azure
Media Analytics. (More details on Microsoft’s solutions
are available at go2sm.com/microsoftai.)
Use Case: Entertainment Processing and Highlights
IBM Watson has more than 60 AI services, and at IBC
this past fall, the company showcased the work it did
for the U.S. Open using its first video-specific product
called Watson Video Enrichment, which includes services for scene detection, speech-to-text conversion,
natural-language processing, visual recognition, tone
analyzer, and personality insight.
“The USTA (US Tennis Assoc.) provided us with sev-
eral hundred tennis videos,” says David Kulczar, senior
product manager, Watson Video Analytics, IBM Watson
Media. This content was used to train Watson in tennis-
specific intelligence, includ-
ing player names, game
scoring, sports terminology,
and crowd sentiment analy-
ment levels, including crowd
cheering, gasps, even player
facial and physical expres-
sions like fist pumps.
Watson does a full-text
transcription of all audio
within a piece of content,
plus all image and text infor-
mation, to create a detailed
metadata file, complete with
timecode for each piece of
content. Watson also under-
stands broader concepts like
knowing tennis and basket-
ball are sports. “We did the
full training process in about
a month and that was taken from 80% accuracy to
95%+ accuracy,” says Kulczar.
“A lot of time people tend to over-believe in [AI],”
says Kulczar. “Some people think of artificial intelli-
gence as sort of magic, but it’s not. A machine-learning-
based principal is going to make mistakes. It’s a much
more complex version of what you do with Pandora.
Somebody is actually thumbing up or thumbing down
a video to help the system get better and learn your
During the U.S. Open, IBM wanted to automatical-
ly create clips based on the most compelling content.
Watson works at about three-quarters of real time for
full content assessment with normally complex images,
and can work with content as low as 256Kbps, although
IBM recommends 1Mbps. “If you take the images out
of it and you’re just doing audio and textual it’s amaz-
ingly fast,” Kulczar says.
There were 320 hours of play coverage, and IBM’s
custom solution immediately created clips at the end
of each match and pushed these highlights out to social sites to drive more interest in the tournament.
IBM has an off-the-shelf product too, and this year
the company will be coming out with knowledge kits
for specific industries and live-streaming processing.
“We provide a corpus of knowledge, a body of knowledge out-of-the-box,” Kulczar says. “Generally, if you
want to increase the accuracy of the service, you want to
train on a specific domain.” After the detailed metadata
content has been created, it can be searched for specific instances of particular events or content. Users will
also be able to do custom training for business-specific
information. For example, an athletic company could
train Watson to identify its specific brands or products,
like running shoes.
One of the first uses for video AI was to generate transcripts and keywords. Microsoft’s Video Indexer (shown here) can achieve accuracy in
the high 90% range, according to Milan Gada, principal program manager, video AI for the company.