TY - GEN
T1 - New Directions in Automated Traffic Analysis
AU - Holland, Jordan
AU - Schmitt, Paul
AU - Feamster, Nicholas G.
AU - Mittal, Prateek
N1 - Funding Information:
We thank our shepherd Katharina Kohls and the anonymous reviewers for their helpful comments. We also thank Vitaly Shmatikov for guidance and feedback on early versions of this work, and Jesse London for collaborating on the development of nPrintML. This research was supported in part by the Center for Information Technology Policy at Princeton University and by NSF Award CPS-1739809 under a cooperative agreement with the Department of Homeland Security, DARPA and AFRL under Contract FA8750-19-C-0079, and NSF Awards CNS-1553437 and CNS-1704105.
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/11/12
Y1 - 2021/11/12
N2 - Machine learning is leveraged for many network traffic analysis tasks in security, from application identification to intrusion detection. Yet, the aspects of the machine learning pipeline that ultimately determine the performance of the model - -feature selection and representation, model selection, and parameter tuning - -remain manual and painstaking. This paper presents a method to automate many aspects of traffic analysis, making it easier to apply machine learning techniques to a wider variety of traffic analysis tasks. We introduce nPrint, a tool that generates a unified packet representation that is amenable for representation learning and model training. We integrate nPrint with automated machine learning (AutoML), resulting in nPrintML, a public system that largely eliminates feature extraction and model tuning for a wide variety of traffic analysis tasks. We have evaluated nPrintML on eight separate traffic analysis tasks and released nPrint, nPrintML and the corresponding datasets from our evaluation to enable future work to extend these methods.
AB - Machine learning is leveraged for many network traffic analysis tasks in security, from application identification to intrusion detection. Yet, the aspects of the machine learning pipeline that ultimately determine the performance of the model - -feature selection and representation, model selection, and parameter tuning - -remain manual and painstaking. This paper presents a method to automate many aspects of traffic analysis, making it easier to apply machine learning techniques to a wider variety of traffic analysis tasks. We introduce nPrint, a tool that generates a unified packet representation that is amenable for representation learning and model training. We integrate nPrint with automated machine learning (AutoML), resulting in nPrintML, a public system that largely eliminates feature extraction and model tuning for a wide variety of traffic analysis tasks. We have evaluated nPrintML on eight separate traffic analysis tasks and released nPrint, nPrintML and the corresponding datasets from our evaluation to enable future work to extend these methods.
KW - automated traffic analysis
KW - machine learning on network traffic
KW - network traffic analysis
UR - http://www.scopus.com/inward/record.url?scp=85119373011&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119373011&partnerID=8YFLogxK
U2 - 10.1145/3460120.3484758
DO - 10.1145/3460120.3484758
M3 - Conference contribution
AN - SCOPUS:85119373011
T3 - Proceedings of the ACM Conference on Computer and Communications Security
SP - 3366
EP - 3383
BT - CCS 2021 - Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
PB - Association for Computing Machinery
T2 - 27th ACM Annual Conference on Computer and Communication Security, CCS 2021
Y2 - 15 November 2021 through 19 November 2021
ER -