TY - GEN
T1 - Understanding the Potential of Server-Driven Edge Video Analytics
AU - Zhang, Qizheng
AU - Du, Kuntai
AU - Agarwal, Neil
AU - Netravali, Ravi
AU - Jiang, Junchen
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/3/9
Y1 - 2022/3/9
N2 - The proliferation of edge video analytics applications has given rise to a new breed of streaming protocols which stream aggressively compressed videos to remote servers for compute-intensive DNN inference. One popular design paradigm of such protocols is to leverage the server-side DNN to extract useful feedback (e.g. based on a low-quality-encoded stream sent to the server) and use the feedback to inform how the camera should encode and stream the video in the future. In this server-driven approach, an ideal form of feedback should (1) be derived from minimum information from the video sensor (2) incur minimum bandwidth usage to obtain (3) indicate the optimal video streaming/encoding scheme (e.g. the minimum frames/regions that require high encoding quality). However, our preliminary study shows that these idealized requirements are far from being met. Using object detection as an example use case, we demonstrate significant yet untapped room for improvement by considering a broader design space, in terms of how the feedback should be derived from the DNN, how often it should be extracted, and how to determine the encoding quality of the video on which we extract the feedback.
AB - The proliferation of edge video analytics applications has given rise to a new breed of streaming protocols which stream aggressively compressed videos to remote servers for compute-intensive DNN inference. One popular design paradigm of such protocols is to leverage the server-side DNN to extract useful feedback (e.g. based on a low-quality-encoded stream sent to the server) and use the feedback to inform how the camera should encode and stream the video in the future. In this server-driven approach, an ideal form of feedback should (1) be derived from minimum information from the video sensor (2) incur minimum bandwidth usage to obtain (3) indicate the optimal video streaming/encoding scheme (e.g. the minimum frames/regions that require high encoding quality). However, our preliminary study shows that these idealized requirements are far from being met. Using object detection as an example use case, we demonstrate significant yet untapped room for improvement by considering a broader design space, in terms of how the feedback should be derived from the DNN, how often it should be extracted, and how to determine the encoding quality of the video on which we extract the feedback.
KW - deep neural networks
KW - egde video analytics
KW - saliency
KW - server-driven
UR - http://www.scopus.com/inward/record.url?scp=85127598895&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127598895&partnerID=8YFLogxK
U2 - 10.1145/3508396.3512872
DO - 10.1145/3508396.3512872
M3 - Conference contribution
AN - SCOPUS:85127598895
T3 - HotMobile 2022 - Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications
SP - 8
EP - 14
BT - HotMobile 2022 - Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications
PB - Association for Computing Machinery, Inc
T2 - 23rd Annual International Workshop on Mobile Computing Systems and Applications, HotMobile 2022
Y2 - 9 March 2022 through 10 March 2022
ER -