You are here

AN ANALYSIS OF MISCLASSIFICATION RATES FOR DECISION TREES

Download pdf | Full Screen View

Date Issued:
2007
Abstract/Description:
The decision tree is a well-known methodology for classification and regression. In this dissertation, we focus on the minimization of the misclassification rate for decision tree classifiers. We derive the necessary equations that provide the optimal tree prediction, the estimated risk of the tree's prediction, and the reliability of the tree's risk estimation. We carry out an extensive analysis of the application of Lidstone's law of succession for the estimation of the class probabilities. In contrast to existing research, we not only compute the expected values of the risks but also calculate the corresponding reliability of the risk (measured by standard deviations). We also provide an explicit expression of the k-norm estimation for the tree's misclassification rate that combines both the expected value and the reliability. Furthermore, our proposed and proven theorem on k-norm estimation suggests an efficient pruning algorithm that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly that compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5. Finally, our work provides a deeper understanding of decision trees.
Title: AN ANALYSIS OF MISCLASSIFICATION RATES FOR DECISION TREES.
46 views
22 downloads
Name(s): Zhong, Mingyu, Author
Georgiopoulos, Michael, Committee Chair
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2007
Publisher: University of Central Florida
Language(s): English
Abstract/Description: The decision tree is a well-known methodology for classification and regression. In this dissertation, we focus on the minimization of the misclassification rate for decision tree classifiers. We derive the necessary equations that provide the optimal tree prediction, the estimated risk of the tree's prediction, and the reliability of the tree's risk estimation. We carry out an extensive analysis of the application of Lidstone's law of succession for the estimation of the class probabilities. In contrast to existing research, we not only compute the expected values of the risks but also calculate the corresponding reliability of the risk (measured by standard deviations). We also provide an explicit expression of the k-norm estimation for the tree's misclassification rate that combines both the expected value and the reliability. Furthermore, our proposed and proven theorem on k-norm estimation suggests an efficient pruning algorithm that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly that compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5. Finally, our work provides a deeper understanding of decision trees.
Identifier: CFE0001774 (IID), ucf:47271 (fedora)
Note(s): 2007-08-01
Ph.D.
Engineering and Computer Science, School of Electrical Engineering and Computer Science
Doctorate
This record was generated from author submitted information.
Subject(s): Artificial intelligence
Machine learning
Decision tree
Bayes classifier
Probability estimation
Law of succession
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0001774
Restrictions on Access: public
Host Institution: UCF

In Collections