TY - GEN
T1 - Towards memory-efficient inference in edge video analytics
AU - Padmanabhan, Arthi
AU - Iyer, Anand Padmanabha
AU - Ananthanarayanan, Ganesh
AU - Shu, Yuanchao
AU - Karianakis, Nikolaos
AU - Xu, Guoqing Harry
AU - Netravali, Ravi
N1 - Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/10/25
Y1 - 2021/10/25
N2 - Video analytics pipelines incorporate on-premise edge servers to lower analysis latency, ensure privacy, and reduce bandwidth requirements. However, compared to the cloud, edge servers typically have lower processing power and GPU memory, limiting the number of video streams that they can manage and analyze. Existing solutions for memory management, such as swapping models in and out of GPU, having a common model stem, or compression and quantization to reduce the model size incur high overheads and often provide limited benefits. In this paper, we propose model merging as an approach towards memory management at the edge. This proposal is based on our observation that models at the edge share common layers, and that merging these common layers across models can result in significant memory savings. Our preliminary evaluation indicates that such an approach could result in up to 75% savings in the memory requirements. We conclude by discussing several challenges involved with realizing the model merging vision.
AB - Video analytics pipelines incorporate on-premise edge servers to lower analysis latency, ensure privacy, and reduce bandwidth requirements. However, compared to the cloud, edge servers typically have lower processing power and GPU memory, limiting the number of video streams that they can manage and analyze. Existing solutions for memory management, such as swapping models in and out of GPU, having a common model stem, or compression and quantization to reduce the model size incur high overheads and often provide limited benefits. In this paper, we propose model merging as an approach towards memory management at the edge. This proposal is based on our observation that models at the edge share common layers, and that merging these common layers across models can result in significant memory savings. Our preliminary evaluation indicates that such an approach could result in up to 75% savings in the memory requirements. We conclude by discussing several challenges involved with realizing the model merging vision.
KW - deep neural networks
KW - video analytics
UR - http://www.scopus.com/inward/record.url?scp=85141307879&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141307879&partnerID=8YFLogxK
U2 - 10.1145/3477083.3480150
DO - 10.1145/3477083.3480150
M3 - Conference contribution
AN - SCOPUS:85141307879
T3 - HotEdgeVideo 2021 - Proceedings of the 2021 3rd ACM Workshop on Hot Topics in Video Analytics and Intelligent Edges
SP - 31
EP - 37
BT - HotEdgeVideo 2021 - Proceedings of the 2021 3rd ACM Workshop on Hot Topics in Video Analytics and Intelligent Edges
PB - Association for Computing Machinery, Inc
T2 - 3rd ACM Workshop on Hot Topics in Video Analytics and Intelligent Edges, HotEdgeVideo 2021
Y2 - 25 October 2021
ER -