You are here
Methods for online feature selection for classification problems
- Date Issued:
- 2018
- Abstract/Description:
- Online learning is a growing branch of machine learning which allows all traditional data miningtechniques to be applied on an online stream of data in real-time. In this dissertation, we presentthree efficient algorithms for feature ranking in online classification problems. Each of the methodsare tailored to work well with different types of classification tasks and have different advantages.The reason for this variety of algorithms is that like other machine learning solutions, there is usuallyno algorithm which works well for all types of tasks. The first method, is an online sensitivitybased feature ranking (SFR) which is updated incrementally, and is designed for classificationtasks with continuous features. We take advantage of the concept of global sensitivity and rankfeatures based on their impact on the outcome of the classification model. In the feature selectionpart, we use a two-stage filtering method in order to first eliminate highly correlated and redundantfeatures and then eliminate irrelevant features in the second stage. One important advantage of ouralgorithm is its generality, which means the method works for correlated feature spaces withoutpreprocessing. It can be implemented along with any single-pass online classification method withseparating hyperplane such as SVMs. In the second method, with help of probability theory wepropose an algorithm which measures the importance of the features by observing the changes inlabel prediction in case of feature substitution. A non-parametric version of the proposed methodis presented to eliminate the distribution type assumptions. These methods are application to alldata types including mixed feature spaces. At last, we present a class-based feature importanceranking method which evaluates the importance of each feature for each class, these sub-rankingsare further exploited to train an ensemble of classifiers. The proposed methods will be thoroughlytested using benchmark datasets and the results will be discussed in the last chapter.
Title: | Methods for online feature selection for classification problems. |
46 views
28 downloads |
---|---|---|
Name(s): |
Razmjoo, Alaleh, Author Zheng, Qipeng, Committee Chair Rabelo, Luis, Committee Member Boginski, Vladimir, Committee Member Xanthopoulos, Petros, Committee Member University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2018 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | Online learning is a growing branch of machine learning which allows all traditional data miningtechniques to be applied on an online stream of data in real-time. In this dissertation, we presentthree efficient algorithms for feature ranking in online classification problems. Each of the methodsare tailored to work well with different types of classification tasks and have different advantages.The reason for this variety of algorithms is that like other machine learning solutions, there is usuallyno algorithm which works well for all types of tasks. The first method, is an online sensitivitybased feature ranking (SFR) which is updated incrementally, and is designed for classificationtasks with continuous features. We take advantage of the concept of global sensitivity and rankfeatures based on their impact on the outcome of the classification model. In the feature selectionpart, we use a two-stage filtering method in order to first eliminate highly correlated and redundantfeatures and then eliminate irrelevant features in the second stage. One important advantage of ouralgorithm is its generality, which means the method works for correlated feature spaces withoutpreprocessing. It can be implemented along with any single-pass online classification method withseparating hyperplane such as SVMs. In the second method, with help of probability theory wepropose an algorithm which measures the importance of the features by observing the changes inlabel prediction in case of feature substitution. A non-parametric version of the proposed methodis presented to eliminate the distribution type assumptions. These methods are application to alldata types including mixed feature spaces. At last, we present a class-based feature importanceranking method which evaluates the importance of each feature for each class, these sub-rankingsare further exploited to train an ensemble of classifiers. The proposed methods will be thoroughlytested using benchmark datasets and the results will be discussed in the last chapter. | |
Identifier: | CFE0007584 (IID), ucf:52567 (fedora) | |
Note(s): |
2018-08-01 Ph.D. Engineering and Computer Science, Industrial Engineering and Management Systems Doctoral This record was generated from author submitted information. |
|
Subject(s): | online machine learning -- feature selection -- classification | |
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0007584 | |
Restrictions on Access: | campus 2022-02-15 | |
Host Institution: | UCF |