MalAlert: Detecting Malware in Large-Scale Network Traffic Using Statistical Features


In recent years, we witness the spreading of a significant variety of malware, which operate and propagate relying on network communications. Due to the staggering growth of traffic in the last years, detecting malicious software has become infeasible on a packet-by-packet basis. In this paper, we address this challenge by investigating malware behaviors and designing a method to detect them relying only on network flow-level data. In our analysis we identify malware types with regards to their impact on a network and the way they achieve their malicious purposes. Leveraging this knowledge, we propose a machine learning-based and privacy-preserving method to detect malware. We evaluate our results on two malware datasets (MalRec and CTU-13) containing traffic of over 65,000 malware samples, as well as one month of network traffic from the University of Oxford containing over 23 billion flows. We show that despite the coarse-grained information provided by network flows and the imbalance between legitimate and malicious traffic, MalAlert can distinguish between different types of malware with the F1 score of 90%.

ACM SIGMETRICS Performance Evaluation Review