Current Search: Feature Selection (x)
View All Items
- Title
- Methods for online feature selection for classification problems.
- Creator
-
Razmjoo, Alaleh, Zheng, Qipeng, Rabelo, Luis, Boginski, Vladimir, Xanthopoulos, Petros, University of Central Florida
- Abstract / Description
-
Online learning is a growing branch of machine learning which allows all traditional data miningtechniques to be applied on an online stream of data in real-time. In this dissertation, we presentthree efficient algorithms for feature ranking in online classification problems. Each of the methodsare tailored to work well with different types of classification tasks and have different advantages.The reason for this variety of algorithms is that like other machine learning solutions, there is...
Show moreOnline learning is a growing branch of machine learning which allows all traditional data miningtechniques to be applied on an online stream of data in real-time. In this dissertation, we presentthree efficient algorithms for feature ranking in online classification problems. Each of the methodsare tailored to work well with different types of classification tasks and have different advantages.The reason for this variety of algorithms is that like other machine learning solutions, there is usuallyno algorithm which works well for all types of tasks. The first method, is an online sensitivitybased feature ranking (SFR) which is updated incrementally, and is designed for classificationtasks with continuous features. We take advantage of the concept of global sensitivity and rankfeatures based on their impact on the outcome of the classification model. In the feature selectionpart, we use a two-stage filtering method in order to first eliminate highly correlated and redundantfeatures and then eliminate irrelevant features in the second stage. One important advantage of ouralgorithm is its generality, which means the method works for correlated feature spaces withoutpreprocessing. It can be implemented along with any single-pass online classification method withseparating hyperplane such as SVMs. In the second method, with help of probability theory wepropose an algorithm which measures the importance of the features by observing the changes inlabel prediction in case of feature substitution. A non-parametric version of the proposed methodis presented to eliminate the distribution type assumptions. These methods are application to alldata types including mixed feature spaces. At last, we present a class-based feature importanceranking method which evaluates the importance of each feature for each class, these sub-rankingsare further exploited to train an ensemble of classifiers. The proposed methods will be thoroughlytested using benchmark datasets and the results will be discussed in the last chapter.
Show less - Date Issued
- 2018
- Identifier
- CFE0007584, ucf:52567
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007584
- Title
- DATA MINING METHODS FOR MALWARE DETECTION.
- Creator
-
Siddiqui, Muazzam, Wang, Morgan, University of Central Florida
- Abstract / Description
-
This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to...
Show moreThis research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval and classification techniques and borrows a number of ideas from the field. We used a vector space model to represent the programs in our collection. Our data mining framework includes two separate and distinct classes of experiments. The first are the supervised learning experiments that used a dataset, consisting of several thousand malicious and clean program samples to train, validate and test, an array of classifiers. In the second class of experiments, we proposed using sequential association analysis for feature selection and automatic signature extraction. With our experiments, we were able to achieve as high as 98.4% detection rate and as low as 1.9% false positive rate on novel malwares.
Show less - Date Issued
- 2008
- Identifier
- CFE0002303, ucf:47870
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0002303