Education, Science, Technology, Innovation and Life
Open Access
Sign In

Research on of Malicious Code Classification Based on Machine Learning

Download as PDF

DOI: 10.23977/iset.2019.016

Author(s)

Min Ai, Zini Bie

Corresponding Author

Min Ai

ABSTRACT

With the rapid development of internet technology, malicious code analysis techniques are also developing, resulting in huge challenges for existing malicious code analysis technology. The existing malicious code analysis techniques are mainly divided into static analysis methods and dynamic analysis methods. However static analysis methods often cannot effectively solve the problem of malicious code obfuscation technology, which leads to weak availability of malicious code under static analysis. Compared with static analysis methods, dynamic analysis methods can effectively overcome the confusion of malicious code. However, there are some shortcomings in the dynamic analysis of malicious code: (1) The execution of malicious code is strictly restricted by the environment, and some virtual environments cannot even trigger the execution of code; (2) Each execution of malicious code can only obtain a single execution path; (3) It takes a long time to analyze massive malicious code, and the analysis efficiency needs to be improved. With the continuous development of machine learning, the method of malicious code analysis based on machine learning has received extensive attention. However, the existing machine learning-based malicious code classification method often requires manual design and participation in the feature extraction stage. This requires prior knowledge and cannot automatically learn the characteristics of malicious code, which affects the classification and clustering accuracy of malicious code to a certain extent. Therefore, in view of the shortcomings of the current malicious code analysis method based on machine learning, and the theory or method of machine learning, this paper focus on an in-depth study on the serialization representation of malicious code, static anti-obfuscation of malicious code, and classification methods of malicious code. First, every binary file of the malicious code is processed and converted into a two-dimensional array of n*k, which is a vectored representation of the malicious code. Then, the appropriate machine learning methods are trained to explore a suitable application model for malicious code classification.

KEYWORDS

Machine learning, malicious code, CNN, SVM, GRU-SVM

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.