When it comes to streaming video, everyone has their own tastes. Some people want the highest-possible definition movies to watch on their 4k television, some viewers just want a minimum of buffering as they watch on their phone during a long bus ride. Some might have live sports on in the background and only pay close attention when highlights occur, others might want to see every detail of every play. Even machines have their preferences; the AI models monitoring video sensors may only care about certain pixels of the video frame, or only need higher quality footage when rare events occur.
This diversity of opinion could help solve the growing traffic jam in systems and networking as data-intensive video streaming soars. That’s the hope of UChicago CS Assistant Professor Junchen Jiang, who received the NSF CAREER Award to study how individual preferences and machine learning can help automatically optimize video quality while also conserving valuable bandwidth. The award, the NSF’s most prestigious for early-career faculty, was one of six awarded to UChicago CS faculty during the 2021-22 cycle.
Jiang proposes a new approach called Perception-Driven Optimization (PDO) which would establish a two-way conversation between the video provider and its viewers, whether human or machine. Instead of always streaming video at the maximum quality possible, this system would collect data on end user satisfaction, then use that data as a crucial feedback signal to train video streaming models that meet those demands with the minimum use of resources.
“The key to solve this problem is, at least in my belief, that you need to really understand how users perceive the quality of these video streams: the resolution, how smoothly these videos are played,” Jiang said. “Once you have a better understanding of that, then it will go back and solve the real problem of how to build video streaming systems and resource sharing among users, because now you know when users are sensitive to the quality, and then you can allocate and use resources more wisely.”
Currently, most video providers use sample videos and crowdsourcing methods to measure the quality of viewers’ experience, often generalizing to broad categories such as movies, TV shows, or sports. This data is used to train general models “offline,” after the testing, that guide the codecs for streaming videos, but they are a blunt instrument for taking advantage of the differences between users or within content types. These methods are also poorly equipped for live video streams, which by their nature cannot be pre-tested.
Jiang’s PDO system would enable a more individualized approach, allowing video providers to collect user experience data and tune their models “online” for each piece of content as it is streaming. For example, a model could learn that a viewer of a baseball game pays less attention between pitches, thus prioritizing the highest quality video during game action and moving any potential buffering or resolution loss to other times in the broadcast.
In the case of AI “watching” video streams collected by sensors to make our cities and streets safer, those models could communicate with the streaming models to focus resources on the most important data features. Rather than transmitting full-screen HD data at all times, a grid of cameras watching a forest for fires may only need to transmit high-quality footage when a certain luminescence is detected in a region of pixels, or may prioritize the resolution of trees in its view over the sky.
“The system can leverage the feedback from the AI model to stream a lower quality version to the server,” Jiang said. “The server will give you some feedback about how that model wants the video to be further encoded: which part you want to zoom in or zoom out, which part needs to have a higher frame rate. That’s a two-way real-time feedback loop.”
Jiang was already working on new approaches for improving video streaming when he joined UChicago CS in 2018, but he credits his collaborators in the department for expanding his research vision in new ways. Colleagues such as Nick Feamster, Hank Hoffmann, Shan Lu, Ben Zhao, and Heather Zheng have helped bring networking, adaptive systems, and edge computing expertise to Jiang’s work. His students, including Xu Zhang, Kuntai Du, and Zhengxu Xia, have also driven the research in surprising directions, developing new user studies and video analytics.
The project also serves as a valuable exercise for undergraduate students in Jiang’s Introduction to Computer Systems course, as it combines topics of high interest such as machine learning, networking, and computer vision. As an application-focused project, students can literally see with their own eyes how improvements in modeling and quality-of-experience data can improve streaming video.
“That’s probably one of the best ways to attract students who don’t have a lot of experience from a computer science background to get interested in how systems are built,” Jiang said. “You can directly see that, if the systems are built in a better way, it has a direct impact on users’ perception of the application, and then we’re going to naturally ask the question how to allocate and how to best utilize the resource we are going to get.”