Detecting Malicious Domain Name Based on the Web Page Structure Similarity
Download as PDF
DOI: 10.23977/CNCI2020066
Author(s)
Xiaoyan Liu, Yue Shi, Yanan Cheng, Haiyan Xu, Zhaoxin Zhang
Corresponding Author
Xiaoyan Liu
ABSTRACT
In order to detect the malicious domain name accurately, a method of detecting the malicious domain name based on the similarity of web page structure of Web document object model is proposed. The key is how to calculate the hierarchical similarities among web page structures quickly and effectively. This method first obtains the source code of the domain name and analyzes its DOM tree structure, constructs the DOM tree level tag attribute name sequence to describe the characteristics of the domain name's web page structure, and then defines the DOM tree distance based on the idea of Simhash algorithm to measure the similarity between the web page structures. Experiment shows that the method can detect the similarity of domain page structure effectively with high accuracy and recall rate.
KEYWORDS
Document object model; web page structure; Simhash algorithm; hierarchical similarity