You are here
Bayesian Model Selection for Classification with Possibly Large Number of Groups
- Date Issued:
- 2011
- Abstract/Description:
- The purpose of the present dissertation is to study model selection techniques which are specifically designed for classification of high-dimensional data with a large number of classes. To the best of our knowledge, this problem has never been studied in depth previously. We assume that the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. In what follows, we introduce two Bayesian models which use two different approaches to the problem: one which discards components which have "almost constant" values (Model 1) and another which retains the components for which between-group variations are larger than within-group variation (Model 2). We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization to the case of L (>) 2 classes. A nontrivial result of the dissertation is that the precision of feature selection using Model 2 improves when the number of classes grows. Subsequently, we examine the rate of misclassification with and without feature selection on the basis of Model 2.
Title: | Bayesian Model Selection for Classification with Possibly Large Number of Groups. |
72 views
31 downloads |
---|---|---|
Name(s): |
Davis, Justin, Author Pensky, Marianna, Committee Chair Swanson, Jason, Committee Member Richardson, Gary, Committee Member Crampton, William, Committee Member Ni, Liqiang, Committee Member University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2011 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | The purpose of the present dissertation is to study model selection techniques which are specifically designed for classification of high-dimensional data with a large number of classes. To the best of our knowledge, this problem has never been studied in depth previously. We assume that the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. In what follows, we introduce two Bayesian models which use two different approaches to the problem: one which discards components which have "almost constant" values (Model 1) and another which retains the components for which between-group variations are larger than within-group variation (Model 2). We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization to the case of L (>) 2 classes. A nontrivial result of the dissertation is that the precision of feature selection using Model 2 improves when the number of classes grows. Subsequently, we examine the rate of misclassification with and without feature selection on the basis of Model 2. | |
Identifier: | CFE0004097 (IID), ucf:49091 (fedora) | |
Note(s): |
2011-12-01 Ph.D. Sciences, Mathematics Doctoral This record was generated from author submitted information. |
|
Subject(s): | HDLSS -- model selection -- classification | |
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0004097 | |
Restrictions on Access: | public 2011-12-15 | |
Host Institution: | UCF |