You are here

Methods for online feature selection for classification problems

Download pdf | Full Screen View

Date Issued:
2018
Abstract/Description:
Online learning is a growing branch of machine learning which allows all traditional data miningtechniques to be applied on an online stream of data in real-time. In this dissertation, we presentthree efficient algorithms for feature ranking in online classification problems. Each of the methodsare tailored to work well with different types of classification tasks and have different advantages.The reason for this variety of algorithms is that like other machine learning solutions, there is usuallyno algorithm which works well for all types of tasks. The first method, is an online sensitivitybased feature ranking (SFR) which is updated incrementally, and is designed for classificationtasks with continuous features. We take advantage of the concept of global sensitivity and rankfeatures based on their impact on the outcome of the classification model. In the feature selectionpart, we use a two-stage filtering method in order to first eliminate highly correlated and redundantfeatures and then eliminate irrelevant features in the second stage. One important advantage of ouralgorithm is its generality, which means the method works for correlated feature spaces withoutpreprocessing. It can be implemented along with any single-pass online classification method withseparating hyperplane such as SVMs. In the second method, with help of probability theory wepropose an algorithm which measures the importance of the features by observing the changes inlabel prediction in case of feature substitution. A non-parametric version of the proposed methodis presented to eliminate the distribution type assumptions. These methods are application to alldata types including mixed feature spaces. At last, we present a class-based feature importanceranking method which evaluates the importance of each feature for each class, these sub-rankingsare further exploited to train an ensemble of classifiers. The proposed methods will be thoroughlytested using benchmark datasets and the results will be discussed in the last chapter.
Title: Methods for online feature selection for classification problems.
46 views
28 downloads
Name(s): Razmjoo, Alaleh, Author
Zheng, Qipeng, Committee Chair
Rabelo, Luis, Committee Member
Boginski, Vladimir, Committee Member
Xanthopoulos, Petros, Committee Member
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2018
Publisher: University of Central Florida
Language(s): English
Abstract/Description: Online learning is a growing branch of machine learning which allows all traditional data miningtechniques to be applied on an online stream of data in real-time. In this dissertation, we presentthree efficient algorithms for feature ranking in online classification problems. Each of the methodsare tailored to work well with different types of classification tasks and have different advantages.The reason for this variety of algorithms is that like other machine learning solutions, there is usuallyno algorithm which works well for all types of tasks. The first method, is an online sensitivitybased feature ranking (SFR) which is updated incrementally, and is designed for classificationtasks with continuous features. We take advantage of the concept of global sensitivity and rankfeatures based on their impact on the outcome of the classification model. In the feature selectionpart, we use a two-stage filtering method in order to first eliminate highly correlated and redundantfeatures and then eliminate irrelevant features in the second stage. One important advantage of ouralgorithm is its generality, which means the method works for correlated feature spaces withoutpreprocessing. It can be implemented along with any single-pass online classification method withseparating hyperplane such as SVMs. In the second method, with help of probability theory wepropose an algorithm which measures the importance of the features by observing the changes inlabel prediction in case of feature substitution. A non-parametric version of the proposed methodis presented to eliminate the distribution type assumptions. These methods are application to alldata types including mixed feature spaces. At last, we present a class-based feature importanceranking method which evaluates the importance of each feature for each class, these sub-rankingsare further exploited to train an ensemble of classifiers. The proposed methods will be thoroughlytested using benchmark datasets and the results will be discussed in the last chapter.
Identifier: CFE0007584 (IID), ucf:52567 (fedora)
Note(s): 2018-08-01
Ph.D.
Engineering and Computer Science, Industrial Engineering and Management Systems
Doctoral
This record was generated from author submitted information.
Subject(s): online machine learning -- feature selection -- classification
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0007584
Restrictions on Access: campus 2022-02-15
Host Institution: UCF

In Collections