You are here

Bayesian Model Selection for Classification with Possibly Large Number of Groups

Download pdf | Full Screen View

Date Issued:
2011
Abstract/Description:
The purpose of the present dissertation is to study model selection techniques which are specifically designed for classification of high-dimensional data with a large number of classes. To the best of our knowledge, this problem has never been studied in depth previously. We assume that the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. In what follows, we introduce two Bayesian models which use two different approaches to the problem: one which discards components which have "almost constant" values (Model 1) and another which retains the components for which between-group variations are larger than within-group variation (Model 2). We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization to the case of L (>) 2 classes. A nontrivial result of the dissertation is that the precision of feature selection using Model 2 improves when the number of classes grows. Subsequently, we examine the rate of misclassification with and without feature selection on the basis of Model 2.
Title: Bayesian Model Selection for Classification with Possibly Large Number of Groups.
72 views
31 downloads
Name(s): Davis, Justin, Author
Pensky, Marianna, Committee Chair
Swanson, Jason, Committee Member
Richardson, Gary, Committee Member
Crampton, William, Committee Member
Ni, Liqiang, Committee Member
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2011
Publisher: University of Central Florida
Language(s): English
Abstract/Description: The purpose of the present dissertation is to study model selection techniques which are specifically designed for classification of high-dimensional data with a large number of classes. To the best of our knowledge, this problem has never been studied in depth previously. We assume that the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. In what follows, we introduce two Bayesian models which use two different approaches to the problem: one which discards components which have "almost constant" values (Model 1) and another which retains the components for which between-group variations are larger than within-group variation (Model 2). We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization to the case of L (>) 2 classes. A nontrivial result of the dissertation is that the precision of feature selection using Model 2 improves when the number of classes grows. Subsequently, we examine the rate of misclassification with and without feature selection on the basis of Model 2.
Identifier: CFE0004097 (IID), ucf:49091 (fedora)
Note(s): 2011-12-01
Ph.D.
Sciences, Mathematics
Doctoral
This record was generated from author submitted information.
Subject(s): HDLSS -- model selection -- classification
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0004097
Restrictions on Access: public 2011-12-15
Host Institution: UCF

In Collections