You are here

DATA MINING METHODS FOR MALWARE DETECTION

Download pdf | Full Screen View

Date Issued:
2008
Abstract/Description:
This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval and classification techniques and borrows a number of ideas from the field. We used a vector space model to represent the programs in our collection. Our data mining framework includes two separate and distinct classes of experiments. The first are the supervised learning experiments that used a dataset, consisting of several thousand malicious and clean program samples to train, validate and test, an array of classifiers. In the second class of experiments, we proposed using sequential association analysis for feature selection and automatic signature extraction. With our experiments, we were able to achieve as high as 98.4% detection rate and as low as 1.9% false positive rate on novel malwares.
Title: DATA MINING METHODS FOR MALWARE DETECTION.
25 views
10 downloads
Name(s): Siddiqui, Muazzam, Author
Wang, Morgan, Committee Chair
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2008
Publisher: University of Central Florida
Language(s): English
Abstract/Description: This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval and classification techniques and borrows a number of ideas from the field. We used a vector space model to represent the programs in our collection. Our data mining framework includes two separate and distinct classes of experiments. The first are the supervised learning experiments that used a dataset, consisting of several thousand malicious and clean program samples to train, validate and test, an array of classifiers. In the second class of experiments, we proposed using sequential association analysis for feature selection and automatic signature extraction. With our experiments, we were able to achieve as high as 98.4% detection rate and as low as 1.9% false positive rate on novel malwares.
Identifier: CFE0002303 (IID), ucf:47870 (fedora)
Note(s): 2008-08-01
Ph.D.
Sciences, Other
Doctorate
This record was generated from author submitted information.
Subject(s): Data Mining
Malware Detection
Machine Learning
Classification
Instruction Sequences
Signature Extraction
Predictive Modeling
Supervised Learning
Unsupervised Learning
Feature Selection
Feature Reduction
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0002303
Restrictions on Access: public
Host Institution: UCF

In Collections