Lexical Mining of Malicious URLs for Classifying Android Malware


The prevalence of mobile malware has become a growing is-sue given the tight integration of mobile systems with our daily life. Mostmalware programs use URLs inside network traffic to forward commandsto launch malicious activities. Therefore, the detection of malicious URLscan be essential in deterring such malicious activities. Traditional meth-ods construct blacklists with verified URLs to identify malicious URLs,but their effectiveness is impaired by unknown malicious URLs. Recently,machine learning-based methods have been proposed for malware detec-tion with improved performance. In this paper, we propose a novel URLdetection method based on Floating Centroids Method (FCM), whichintegrates supervised classification and unsupervised clustering in a co-herent manner. The proposed method uses the lexical features of a URLto effectively identify malicious URLs while grouping similar URLs in-to the same cluster. Our experimental results show that a URL clusterexhibits unique behavioral patterns that can be used for malware detec-tion with high accuracy. Moreover, the proposed behavioral clusteringmethod facilitates the identification of malicious URL categories andunseen malware variants.

International Conference on Security and Privacy in Communication Systems