You are here

Robust, Scalable, and Provable Approaches to High Dimensional Unsupervised Learning

Download pdf | Full Screen View

Date Issued:
2018
Abstract/Description:
This doctoral thesis focuses on three popular unsupervised learning problems: subspace clustering, robust PCA, and column sampling. For the subspace clustering problem, a new transformative idea is presented. The proposed approach, termed Innovation Pursuit, is a new geometrical solution to the subspace clustering problem whereby subspaces are identified based on their relative novelties. A detailed mathematical analysis is provided establishing sufficient conditions for the proposed method to correctly cluster the data points. The numerical simulations with both real and synthetic data demonstrate that Innovation Pursuit notably outperforms the state-of-the-art subspace clustering algorithms. For the robust PCA problem, we focus on both the outlier detection and the matrix decomposition problems. For the outlier detection problem, we present a new algorithm, termed Coherence Pursuit, in addition to two scalable randomized frameworks for the implementation of outlier detection algorithms. The Coherence Pursuit method is the first provable and non-iterative robust PCA method which is provably robust to both unstructured and structured outliers. Coherence Pursuit is remarkably simple and it notably outperforms the existing methods in dealing with structured outliers. In the proposed randomized designs, we leverage the low dimensional structure of the low rank component to apply the robust PCA algorithm to a random sketch of the data as opposed to the full scale data. Importantly, it is analytically shown that the presented randomized designs can make the computation or sample complexity of the low rank matrix recovery algorithm independent of the size of the data. At the end, we focus on the column sampling problem. A new sampling tool, dubbed Spatial Random Sampling, is presented which performs the random sampling in the spatial domain. The most compelling feature of Spatial Random Sampling is that it is the first unsupervised column sampling method which preserves the spatial distribution of the data.
Title: Robust, Scalable, and Provable Approaches to High Dimensional Unsupervised Learning.
42 views
24 downloads
Name(s): Rahmani, Mostafa, Author
Atia, George, Committee Chair
Vosoughi, Azadeh, Committee Member
Mikhael, Wasfy, Committee Member
Nashed, M, Committee Member
Pensky, Marianna, Committee Member
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2018
Publisher: University of Central Florida
Language(s): English
Abstract/Description: This doctoral thesis focuses on three popular unsupervised learning problems: subspace clustering, robust PCA, and column sampling. For the subspace clustering problem, a new transformative idea is presented. The proposed approach, termed Innovation Pursuit, is a new geometrical solution to the subspace clustering problem whereby subspaces are identified based on their relative novelties. A detailed mathematical analysis is provided establishing sufficient conditions for the proposed method to correctly cluster the data points. The numerical simulations with both real and synthetic data demonstrate that Innovation Pursuit notably outperforms the state-of-the-art subspace clustering algorithms. For the robust PCA problem, we focus on both the outlier detection and the matrix decomposition problems. For the outlier detection problem, we present a new algorithm, termed Coherence Pursuit, in addition to two scalable randomized frameworks for the implementation of outlier detection algorithms. The Coherence Pursuit method is the first provable and non-iterative robust PCA method which is provably robust to both unstructured and structured outliers. Coherence Pursuit is remarkably simple and it notably outperforms the existing methods in dealing with structured outliers. In the proposed randomized designs, we leverage the low dimensional structure of the low rank component to apply the robust PCA algorithm to a random sketch of the data as opposed to the full scale data. Importantly, it is analytically shown that the presented randomized designs can make the computation or sample complexity of the low rank matrix recovery algorithm independent of the size of the data. At the end, we focus on the column sampling problem. A new sampling tool, dubbed Spatial Random Sampling, is presented which performs the random sampling in the spatial domain. The most compelling feature of Spatial Random Sampling is that it is the first unsupervised column sampling method which preserves the spatial distribution of the data.
Identifier: CFE0007083 (IID), ucf:52010 (fedora)
Note(s): 2018-05-01
Ph.D.
Engineering and Computer Science, Electrical Engineering and Computer Engineering
Doctoral
This record was generated from author submitted information.
Subject(s): Big Data Analysis -- Machine Learning -- Unsupervised Learning
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0007083
Restrictions on Access: public 2018-05-15
Host Institution: UCF

In Collections