Malicious Traffic Detection with Class Imbalanced Data Based on Coarse-grained Labels

Li, Zhenyu; Liu, Junyi; Wang, Jiarong; Liu, Jiahao; Yan, Tian; An, Dehai; Zhou, Caiqiu; Wang, Zhihua

doi:10.22323/1.415.0030

Abstract

In order to resist complex cyber-attacks, a Security Operations Center (SOC) named IHEPSOC has been developed and deployed in the Institute of High Energy Physics (IHEP) of the Chinese Academy of Sciences, which contributed to the reliability and security of the network for IHEP. It has become a major task to integrate state-of-the-art cyber-attack detection methods for IHEPSOC to improve the ability of threat detection. Malicious traffic detection based on machine learning is an emerging security paradigm, which can effectively detect both known and unknown cyber-attacks. However, the existing studies usually adopt traditional supervised learning, which often encounter issues when applied to real-world production environment due to its implicit assumptions on the operating dependence. For example, most studies are based on datasets that already have accurate data labels, but labeling these datasets accurately requires significant manual effort. In addition, in the real-world service, the volume of benign traffic data is larger than that of the malicious traffic data, and the imbalance between benign and malicious categories also makes many machine learning detection models difficult to apply to a production environment. Based on these, we propose a detection method for class imbalanced malicious traffic based on coarse-grained data labels, which achieves comparable performance compare to other supervised learning methods. We conducted three experiments, using the Android Malware 2017 dataset, and verified the practicability and effectiveness of the proposed method.