Current Search: Hu, Haiyan (x)
View All Items
- Title
- Trust-Based Rating Prediction and Malicious Profile Detection in Online Social Recommender Systems.
- Creator
-
Davoudi, Anahita, Chatterjee, Mainak, Hu, Haiyan, Zou, Changchun, Rahnavard, Nazanin, University of Central Florida
- Abstract / Description
-
Online social networks and recommender systems have become an effective channel for influencing millions of users by facilitating exchange and spread of information. This dissertation addresses multiple challenges that are faced by online social recommender systems such as: i) finding the extent of information spread; ii) predicting the rating of a product; and iii) detecting malicious profiles. Most of the research in this area do not capture the social interactions and rely on empirical or...
Show moreOnline social networks and recommender systems have become an effective channel for influencing millions of users by facilitating exchange and spread of information. This dissertation addresses multiple challenges that are faced by online social recommender systems such as: i) finding the extent of information spread; ii) predicting the rating of a product; and iii) detecting malicious profiles. Most of the research in this area do not capture the social interactions and rely on empirical or statistical approaches without considering the temporal aspects. We capture the temporal spread of information using a probabilistic model and use non-linear differential equations to model the diffusion process. To predict the rating of a product, we propose a social trust model and use the matrix factorization method to estimate user's taste by incorporating user-item rating matrix. The effect of tastes of friends of a user is captured using a trust model which is based on similarities between users and their centralities. Similarity is modeled using Vector Space Similarity and Pearson Correlation Coefficient algorithms, whereas degree, eigen-vector, Katz, and PageRank are used to model centrality. As rating of a product has tremendous influence on its saleability, social recommender systems are vulnerable to profile injection attacks that affect user's opinion towards favorable or unfavorable recommendations for a product. We propose a classification approach for detecting attackers based on attributes that provide the likelihood of a user profile of that of an attacker. To evaluate the performance, we inject push and nuke attacks, and use precision and recall to identify the attackers. All proposed models have been validated using datasets from Facebook, Epinions, and Digg. Results exhibit that the proposed models are able to better predict the information spread, rating of a product, and identify malicious user profiles with high accuracy and low false positives.
Show less - Date Issued
- 2018
- Identifier
- CFE0007168, ucf:52245
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007168
- Title
- Computational Methods for Comparative Non-coding RNA Analysis: From Structural Motif Identification to Genome-wide Functional Classification.
- Creator
-
Zhong, Cuncong, Zhang, Shaojie, Hu, Haiyan, Hua, Kien, Li, Xiaoman, University of Central Florida
- Abstract / Description
-
Non-coding RNA (ncRNA) plays critical functional roles such as regulation, catalysis, and modification etc. in the biological system. Non-coding RNAs exert their functions based on their specific structures, which makes the thorough understanding of their structures a key step towards their complete functional annotation. In this dissertation, we will cover a suite of computational methods for the comparison of ncRNA secondary and 3D structures, and their applications to ncRNA molecular...
Show moreNon-coding RNA (ncRNA) plays critical functional roles such as regulation, catalysis, and modification etc. in the biological system. Non-coding RNAs exert their functions based on their specific structures, which makes the thorough understanding of their structures a key step towards their complete functional annotation. In this dissertation, we will cover a suite of computational methods for the comparison of ncRNA secondary and 3D structures, and their applications to ncRNA molecular structural annotation and their genome-wide functional survey.Specifically, we have contributed the following five computational methods. First, we have developed an alignment algorithm to compare RNA structural motifs, which are recurrent RNA 3D structural fragments. Second, we have improved upon the previous alignment algorithm by incorporating base-stacking information and devise a new branch-and-bond algorithm. Third, we have developed a clustering pipeline for RNA structural motif classification using the above alignment methods. Fourth, we have generalized the clustering pipeline to a genome-wide analysis of RNA secondary structures. Finally, we have devised an ultra-fast alignment algorithm for RNA secondary structure by using the sparse dynamic programming technique.A large number of novel RNA structural motif instances and ncRNA elements have been discovered throughout these studies. We anticipate that these computational methods will significantly facilitate the analysis of ncRNA structures in the future.
Show less - Date Issued
- 2013
- Identifier
- CFE0004966, ucf:49580
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004966
- Title
- Finding Consensus Energy Folding Landscapes Between RNA Sequences.
- Creator
-
Burbridge, Joshua, Zhang, Shaojie, Hu, Haiyan, Jha, Sumit, University of Central Florida
- Abstract / Description
-
In molecular biology, the secondary structure of a ribonucleic acid (RNA) molecule is closely related to its biological function. One problem in structural bioinformatics is to determine the two- and three-dimensional structure of RNA using only sequencing information, which can be obtained at low cost. This entails designing sophisticated algorithms to simulate the process of RNA folding using detailed sets of thermodynamic parameters. The set of all chemically feasible structures an RNA...
Show moreIn molecular biology, the secondary structure of a ribonucleic acid (RNA) molecule is closely related to its biological function. One problem in structural bioinformatics is to determine the two- and three-dimensional structure of RNA using only sequencing information, which can be obtained at low cost. This entails designing sophisticated algorithms to simulate the process of RNA folding using detailed sets of thermodynamic parameters. The set of all chemically feasible structures an RNA molecule can assume, as well as the energy associated with each structure, is called its energy folding landscape. This research focuses on defining and solving the problem of finding the consensus landscape between multiple RNA molecules. Specifically, we discuss how this problem is equivalent to the problem of Balanced Global Network Alignment, and what effect a solution to this problem would have on our understanding of RNA.Because this problem is known to be NP-hard, we instead define an approximate consensus on a landscape of reduced size, which dramatically reduces the searching space associated with the problem. We use the program RNASLOpt to enumerate all stable local optimal secondary structures in multiple landscapes within a certain energy and stability range of the minimum free energy (MFE) structure. We then encode these using an extended structural alphabet and perform sequence alignment using a structural substitution matrix to find and rank the best matches between the sets based on stability, energy, and structural distance. We apply this method to twenty landscapes from four sets of riboswitches from Bacillus subtillis in order to predict their native (")on(") and (")off(") structures. We find that this method significantly reduces the size of the list of candidate structures, as well as increasing the ranking of previously obscure secondary structures, resulting in more accurate predictions overall. Advances in the field of structural bioinformatics can help elucidate the underlying mechanisms of many genetic diseases.
Show less - Date Issued
- 2015
- Identifier
- CFE0006210, ucf:51109
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006210
- Title
- Relating First-person and Third-person Vision.
- Creator
-
Ardeshir Behrostaghi, Shervin, Borji, Ali, Shah, Mubarak, Hu, Haiyan, Atia, George, University of Central Florida
- Abstract / Description
-
Thanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person (egocentric) perspective. Capturing the world from the perspective of one's self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In many computer vision tasks (e.g. identification, action recognition, face recognition, pose estimation, etc.),...
Show moreThanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person (egocentric) perspective. Capturing the world from the perspective of one's self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In many computer vision tasks (e.g. identification, action recognition, face recognition, pose estimation, etc.), the human actors are the main focus. Hence, detecting, localizing, and recognizing the human actor is often incorporated as a vital component. In an egocentric video however, the person behind the camera is often the person of interest. This would change the nature of the task at hand, given that the camera holder is usually not visible in the content of his/her egocentric video. In other words, our knowledge about the visual appearance, pose, etc. on the egocentric camera holder is very limited, suggesting reliance on other cues in first person videos. First and third person videos have been separately studied in the past in the computer vision community. However, the relationship between first and third person vision has yet to be fully explored. Relating these two views systematically could potentially benefit many computer vision tasks and applications. This thesis studies this relationship in several aspects. We explore supervised and unsupervised approaches for relating these two views seeking different objectives such as identification, temporal alignment, and action classification. We believe that this exploration could lead to a better understanding the relationship of these two drastically different sources of information.
Show less - Date Issued
- 2018
- Identifier
- CFE0007151, ucf:52322
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0007151
- Title
- X-ray Radiation Enabled Cancer Detection and Treatment with Nanoparticles.
- Creator
-
Hossain, Mainul, Su, Ming, Behal, Aman, Gong, Xun, Hu, Haiyan, Kapoor, Vikram, Deng, Weiwei, University of Central Florida
- Abstract / Description
-
Despite significant improvements in medical sciences over the last decade, cancer still continues to be a major cause of death in humans throughout the world. Parallel to the efforts of understanding the intricacies of cancer biology, researchers are continuously striving to develop effective cancer detection and treatment strategies. Use of nanotechnology in the modern era opens up a wide range of possibilities for diagnostics, therapies and preventive measures for cancer management....
Show moreDespite significant improvements in medical sciences over the last decade, cancer still continues to be a major cause of death in humans throughout the world. Parallel to the efforts of understanding the intricacies of cancer biology, researchers are continuously striving to develop effective cancer detection and treatment strategies. Use of nanotechnology in the modern era opens up a wide range of possibilities for diagnostics, therapies and preventive measures for cancer management. Although, existing strategies of cancer detection and treatment, using nanoparticles, have been proven successful in case of cancer imaging and targeted drug deliveries, they are often limited by poor sensitivity, lack of specificity, complex sample preparation efforts and inherent toxicities associated with the nanoparticles, especially in case of in-vivo applications. Moreover, the detection of cancer is not necessarily integrated with treatment. X-rays have long been used in radiation therapy to kill cancer cells and also for imaging tumors inside the body using nanoparticles as contrast agents. However, X-rays, in combination with nanoparticles, can also be used for cancer diagnosis by detecting cancer biomarkers and circulating tumor cells. Moreover, the use of nanoparticles can also enhance the efficacy of X-ray radiation therapy for cancer treatment.This dissertation describes a novel in vitro technique for cancer detection and treatment using X-ray radiation and nanoparticles. Surfaces of synthesized metallic nanoparticles have been modified with appropriate ligands to specifically target cancer cells and biomarkers in vitro. Characteristic X-ray fluorescence signals from the X-ray irradiated nanoparticles are then used for detecting the presence of cancer. The method enables simultaneous detection of multiple cancer biomarkers allowing accurate diagnosis and early detection of cancer. Circulating tumor cells, which are the primary indicators of cancer metastasis, have also been detected where the use of magnetic nanoparticles allows enrichment of rare cancer cells prior to detection. The approach is unique in that it integrates cancer detection and treatment under one platform, since, X-rays have been shown to effectively kill cancer cells through radiation induced DNA damage. Due to high penetrating power of X-rays, the method has potential applications for in vivo detection and treatment of deeply buried cancers in humans. The effect of nanoparticle toxicity on multiple cell types has been investigated using conventional cytotoxicity assays for both unmodified nanoparticles as well as nanoparticles modified with a variety of surface coatings. Appropriate surface modifications have significantly reduced inherent toxicity of nanoparticles, providing possibilities for future clinical applications. To investigate cellular damages caused by X-ray radiation, an on-chip biodosimeter has been fabricated based on three dimensional microtissues which allows direct monitoring of responses to X-ray exposure for multiple mammalian cell types. Damage to tumor cells caused by X-rays is known to be significantly higher in presence of nanoparticles which act as radiosensitizers and enhance localized radiation doses. An analytical approach is used to investigate the various parameters that affect the radiosensitizing properties of the nanoparticles. The results can be used to increase the efficacy of nanoparticle aided X-ray radiation therapy for cancer treatment by appropriate choice of X-ray beam energy, nanoparticle size, material composition and location of nanoparticle with respect to the tumor cell nucleus.
Show less - Date Issued
- 2012
- Identifier
- CFE0004547, ucf:49242
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004547
- Title
- Computational Methods for Analyzing RNA Folding Landscapes and its Applications.
- Creator
-
Li, Yuan, Zhang, Shaojie, Hua, Kien, Jha, Sumit, Hu, Haiyan, Li, Xiaoman, University of Central Florida
- Abstract / Description
-
Non-protein-coding RNAs play critical regulatory roles in cellular life. Many ncRNAs fold into specific structures in order to perform their biological functions. Some of the RNAs, such as riboswitches, can even fold into alternative structural conformations in order to participate in different biological processes. In addition, these RNAs can transit dynamically between different functional structures along folding pathways on their energy landscapes. These alternative functional structures...
Show moreNon-protein-coding RNAs play critical regulatory roles in cellular life. Many ncRNAs fold into specific structures in order to perform their biological functions. Some of the RNAs, such as riboswitches, can even fold into alternative structural conformations in order to participate in different biological processes. In addition, these RNAs can transit dynamically between different functional structures along folding pathways on their energy landscapes. These alternative functional structures are usually energetically favored and are stable in their local energy landscapes. Moreover, conformational transitions between any pair of alternate structures usually involve high energy barriers, such that RNAs can become kinetically trapped by these stable and local optimal structures.We have proposed a suite of computational approaches for analyzing and discovering regulatory RNAs through studying folding pathways, alternative structures and energy landscapes associated with conformational transitions of regulatory RNAs. First, we developed an approach, RNAEAPath, which can predict low-barrier folding pathways between two conformational structures of a single RNA molecule. Using RNAEAPath, we can analyze folding pathways between two functional RNA structures, and therefore study the mechanism behind RNA functional transitions from a thermodynamic perspective. Second, we introduced an approach, RNASLOpt, for finding all the stable and local optimal structures on the energy landscape of a single RNA molecule. We can use the generated stable and local optimal structures to represent the RNA energy landscape in a compact manner. In addition, we applied RNASLOpt to several known riboswitches and predicted their alternate functional structures accurately. Third, we integrated a comparative approach with RNASLOpt, and developed RNAConSLOpt, which can find all the consensus stable and local optimal structuresthat are conserved among a set of homologous regulatory RNAs. We can use RNAConSLOpt to predict alternate functional structures for regulatory RNA families. Finally, we have proposed a pipeline making use of RNAConSLOpt to computationally discover novel riboswitches in bacterial genomes. An application of the proposed pipeline to a set of bacteria in Bacillus genus results in the re-discovery of many known riboswitches, and the detection of several novel putative riboswitch elements.
Show less - Date Issued
- 2012
- Identifier
- CFE0004400, ucf:49365
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0004400
- Title
- Content-based Information Retrieval via Nearest Neighbor Search.
- Creator
-
Huang, Yinjie, Georgiopoulos, Michael, Anagnostopoulos, Georgios, Hu, Haiyan, Sukthankar, Gita, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Content-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function...
Show moreContent-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function Learning (HFL) is considered to accelerate the retrieval process.On one hand, a new local metric learning framework is proposed - Reduced-Rank Local Metric Learning (R2LML). By considering a conical combination of Mahalanobis metrics, the proposed method is able to better capture information like data's similarity and location. A regularization to suppress the noise and avoid over-fitting is also incorporated into the formulation. Based on the different methods to infer the weights for the local metric, we considered two frameworks: Transductive Reduced-Rank Local Metric Learning (T-R2LML), which utilizes transductive learning, while Efficient Reduced-Rank Local Metric Learning (E-R2LML)employs a simpler and faster approximated method. Besides, we study the convergence property of the proposed block coordinate descent algorithms for both our frameworks. The extensive experiments show the superiority of our approaches.On the other hand, *Supervised Hash Learning (*SHL), which could be used in supervised, semi-supervised and unsupervised learning scenarios, was proposed in the dissertation. By considering several codewords which could be learned from the data, the proposed method naturally derives to several Support Vector Machine (SVM) problems. After providing an efficient training algorithm, we also study the theoretical generalization bound of the new hashing framework. In the final experiments, *SHL outperforms many other popular hash function learning methods. Additionally, in order to cope with large data sets, we also conducted experiments running on big data using a parallel computing software package, namely LIBSKYLARK.
Show less - Date Issued
- 2016
- Identifier
- CFE0006327, ucf:51544
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006327
- Title
- Computational Approaches for Binning Metagenomic Reads.
- Creator
-
Wang, Ying, Hu, Haiyan, Li, Xiaoman, Zhang, Shaojie, Wu, Annie, Savage, Anna, University of Central Florida
- Abstract / Description
-
Metagenomics uses sequencing technologies to study genetic sequences from whole microbial communities. Binning metagenomic reads is the most fundamental step in metagenomic studies, which is essential for the understanding of microbial functions, compositions, and interactions in environmental samples. Various taxonomy-dependent and taxonomy-independent approaches have been developed based on information such as sequence similarity, sequence composition, or k-mer frequency. However, there is...
Show moreMetagenomics uses sequencing technologies to study genetic sequences from whole microbial communities. Binning metagenomic reads is the most fundamental step in metagenomic studies, which is essential for the understanding of microbial functions, compositions, and interactions in environmental samples. Various taxonomy-dependent and taxonomy-independent approaches have been developed based on information such as sequence similarity, sequence composition, or k-mer frequency. However, there is still room for improvement, and it is still challenging to bin reads from species with similar or low abundance or to bin reads from unknown species.In this dissertation, we introduce one taxonomy-independent and three taxonomy-dependent approaches to improve the performance of metagenomic reads binning. The taxonomy-independent method called MBBC, bins reads by considering k-mer frequency in reads without reference genomes. The first two taxonomy-dependent methods both bin reads by measuring the similarity of reads to the trained Markov Chains from different taxa. The major difference between these two methods is that the first one selects the potential taxa with the taxonomical decision tree, while the second one, called MBMC, selects potential taxa using ordinary least squares (OLS) method. The third taxonomy-dependent method bins reads by combining the methods of MBMC with clustering Markov chains from the assembled reads. By testing on both simulated and real datasets, these tools showed superior or comparable performance with various the state of the art methods. We anticipate that our tools can significantly improve the accuracy of metagenomic reads binning and thus be widely applied in real environmental samples.
Show less - Date Issued
- 2016
- Identifier
- CFE0006515, ucf:51380
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006515
- Title
- Online, Supervised and Unsupervised Action Localization in Videos.
- Creator
-
Soomro, Khurram, Shah, Mubarak, Heinrich, Mark, Hu, Haiyan, Bagci, Ulas, Yun, Hae-Bum, University of Central Florida
- Abstract / Description
-
Action recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window...
Show moreAction recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window techniques. The first part of this dissertation presents an efficient approach for localizing actions by learning contextual relations between different video regions in training. In testing, we use the context information to estimate the probability of each supervoxel belonging to the foreground action and use Conditional Random Field (CRF) to localize actions. In the above method and typical approaches to this problem, localization is performed in an offline manner where all the video frames are processed together. This prevents timely localization and prediction of actions/interactions - an important consideration for many tasks including surveillance and human-machine interaction. Therefore, in the second part of this dissertation we propose an online approach to the challenging problem of localization and prediction of actions/interactions in videos. In this approach, we use human poses and superpixels in each frame to train discriminative appearance models and perform online prediction of actions/interactions with Structural SVM. Above two approaches rely on human supervision in the form of assigning action class labels to videos and annotating actor bounding boxes in each frame of training videos. Therefore, in the third part of this dissertation we address the problem of unsupervised action localization. Given unlabeled videos without annotations, this approach aims at: 1) Discovering action classes using a discriminative clustering approach, and 2) Localizing actions using a variant of Knapsack problem.
Show less - Date Issued
- 2017
- Identifier
- CFE0006917, ucf:51685
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006917
- Title
- Model Selection via Racing.
- Creator
-
Zhang, Tiantian, Georgiopoulos, Michael, Anagnostopoulos, Georgios, Wu, Annie, Hu, Haiyan, Nickerson, David, University of Central Florida
- Abstract / Description
-
Model Selection (MS) is an important aspect of machine learning, as necessitated by the No Free Lunch theorem. Briefly speaking, the task of MS is to identify a subset of models that are optimal in terms of pre-selected optimization criteria. There are many practical applications of MS, such as model parameter tuning, personalized recommendations, A/B testing, etc. Lately, some MS research has focused on trading off exactness of the optimization with somewhat alleviating the computational...
Show moreModel Selection (MS) is an important aspect of machine learning, as necessitated by the No Free Lunch theorem. Briefly speaking, the task of MS is to identify a subset of models that are optimal in terms of pre-selected optimization criteria. There are many practical applications of MS, such as model parameter tuning, personalized recommendations, A/B testing, etc. Lately, some MS research has focused on trading off exactness of the optimization with somewhat alleviating the computational burden entailed. Recent attempts along this line include metaheuristics optimization, local search-based approaches, sequential model-based methods, portfolio algorithm approaches, and multi-armed bandits.Racing Algorithms (RAs) are an active research area in MS, which trade off some computational cost for a reduced, but acceptable likelihood that the models returned are indeed optimal among the given ensemble of models. All existing RAs in the literature are designed as Single-Objective Racing Algorithm (SORA) for Single-Objective Model Selection (SOMS), where a single optimization criterion is considered for measuring the goodness of models. Moreover, they are offline algorithms in which MS occurs before model deployment and the selected models are optimal in terms of their overall average performances on a validation set of problem instances. This work aims to investigate racing approaches along two distinct directions: Extreme Model Selection (EMS) and Multi-Objective Model Selection (MOMS). In EMS, given a problem instance and a limited computational budget shared among all the candidate models, one is interested in maximizing the final solution quality. In such a setting, MS occurs during model comparison in terms of maximum performance and involves no model validation. EMS is a natural framework for many applications. However, EMS problems remain unaddressed by current racing approaches. In this work, the first RA for EMS, named Max-Race, is developed, so that it optimizes the extreme solution quality by automatically allocating the computational resources among an ensemble of problem solvers for a given problem instance. In Max-Race, significant difference between the extreme performances of any pair of models is statistically inferred via a parametric hypothesis test under the Generalized Pareto Distribution (GPD) assumption. Experimental results have confirmed that Max-Race is capable of identifying the best extreme model with high accuracy and low computational cost. Furthermore, in machine learning, as well as in many real-world applications, a variety of MS problems are multi-objective in nature. MS which simultaneously considers multiple optimization criteria is referred to as MOMS. Under this scheme, a set of Pareto optimal models is sought that reflect a variety of compromises between optimization objectives. So far, MOMS problems have received little attention in the relevant literature. Therefore, this work also develops the first Multi-Objective Racing Algorithm (MORA) for a fixed-budget setting, namely S-Race. S-Race addresses MOMS in the proper sense of Pareto optimality. Its key decision mechanism is the non-parametric sign test, which is employed for inferring pairwise dominance relationships. Moreover, S-Race is able to strictly control the overall probability of falsely eliminating any non-dominated models at a user-specified significance level. Additionally, SPRINT-Race, the first MORA for a fixed-confidence setting, is also developed. In SPRINT-Race, pairwise dominance and non-dominance relationships are established via the Sequential Probability Ratio Test with an Indifference zone. Moreover, the overall probability of falsely eliminating any non-dominated models or mistakenly retaining any dominated models is controlled at a prescribed significance level. Extensive experimental analysis has demonstrated the efficiency and advantages of both S-Race and SPRINT-Race in MOMS.
Show less - Date Issued
- 2016
- Identifier
- CFE0006203, ucf:51094
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006203
- Title
- Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval.
- Creator
-
Li, Kai, Hua, Kien, Qi, GuoJun, Hu, Haiyan, Wang, Chung-Ching, University of Central Florida
- Abstract / Description
-
In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data.We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization...
Show moreIn recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data.We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos.The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval.We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features' ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms.Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method.
Show less - Date Issued
- 2017
- Identifier
- CFE0006759, ucf:51840
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006759
- Title
- Transcriptional and Post-transcriptional Regulation of Gene Expression.
- Creator
-
Ding, Jun, Hu, Haiyan, Li, Xiaoman, Zhang, Shaojie, Jin, Yier, University of Central Florida
- Abstract / Description
-
Regulation of gene expression includes a variety of mechanisms to increase or decrease specific gene products. Gene expression can be regulated at any stage from transcription to post-transcription and it's essential to almost all living organisms, as it increases the versatility and adaptability by allowing the cell to express the needed proteins. In this dissertation, we comprehensively studied the gene regulation from both transcriptional and post-transcriptional points of view....
Show moreRegulation of gene expression includes a variety of mechanisms to increase or decrease specific gene products. Gene expression can be regulated at any stage from transcription to post-transcription and it's essential to almost all living organisms, as it increases the versatility and adaptability by allowing the cell to express the needed proteins. In this dissertation, we comprehensively studied the gene regulation from both transcriptional and post-transcriptional points of view. Transcriptional regulation is by which cells regulate the transcription from DNA to RNA, thereby directing gene activity. Transcriptional factors (TFs) play a very important role in transcriptional regulation and they are proteins that bind to specific DNA sequences (regulatory elements) to regulate the gene expression. Current studies on TF binding are still very limited and thus, it leaves much to be improved on understanding the TF binding mechanism. To fill this gap, we proposed a variety of computational methods for predicting TF binding elements, which have been proved to be more efficient and accurate compared with other existing tools such as DREME and RSAT peaks-motif. On the other hand, studying only the transcriptional gene regulation is not enough for a comprehensive understanding. Therefore, we also studied the gene regulation at the post-transcriptional level. MicroRNAs (miRNAs) are believed to post-transcriptionally regulate the expression of thousands of target mRNAs, yet the miRNA binding mechanism is still not well understood. In this dissertation, we explored both the traditional and novel features of miRNA-binding and proposed several computational models for miRNA target prediction. The developed tools outperformed the traditional microRNA target prediction methods (.e.g miRanda and TargetScan) in terms of prediction accuracy (precision, recall) and time efficiency.
Show less - Date Issued
- 2016
- Identifier
- CFE0006098, ucf:51197
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0006098
- Title
- Automatic Detection of Brain Functional Disorder Using Imaging Data.
- Creator
-
Dey, Soumyabrata, Shah, Mubarak, Jha, Sumit, Hu, Haiyan, Weeks, Arthur, Rao, Ravishankar, University of Central Florida
- Abstract / Description
-
Recently, Attention Deficit Hyperactive Disorder (ADHD) is getting a lot of attention mainly for two reasons. First, it is one of the most commonly found childhood behavioral disorders. Around 5-10% of the children all over the world are diagnosed with ADHD. Second, the root cause of the problem is still unknown and therefore no biological measure exists to diagnose ADHD. Instead, doctors need to diagnose it based on the clinical symptoms, such as inattention, impulsivity and hyperactivity,...
Show moreRecently, Attention Deficit Hyperactive Disorder (ADHD) is getting a lot of attention mainly for two reasons. First, it is one of the most commonly found childhood behavioral disorders. Around 5-10% of the children all over the world are diagnosed with ADHD. Second, the root cause of the problem is still unknown and therefore no biological measure exists to diagnose ADHD. Instead, doctors need to diagnose it based on the clinical symptoms, such as inattention, impulsivity and hyperactivity, which are all subjective.Functional Magnetic Resonance Imaging (fMRI) data has become a popular tool to understand the functioning of the brain such as identifying the brain regions responsible for different cognitive tasks or analyzing the statistical differences of the brain functioning between the diseased and control subjects. ADHD is also being studied using the fMRI data. In this dissertation we aim to solve the problem of automatic diagnosis of the ADHD subjects using their resting state fMRI (rs-fMRI) data.As a core step of our approach, we model the functions of a brain as a connectivity network, which is expected to capture the information about how synchronous different brain regions are in terms of their functional activities. The network is constructed by representing different brain regions as the nodes where any two nodes of the network are connected by an edge if the correlation of the activity patterns of the two nodes is higher than some threshold. The brain regions, represented as the nodes of the network, can be selected at different granularities e.g. single voxels or cluster of functionally homogeneous voxels. The topological differences of the constructed networks of the ADHD and control group of subjects are then exploited in the classification approach.We have developed a simple method employing the Bag-of-Words (BoW) framework for the classification of the ADHD subjects. We represent each node in the network by a 4-D feature vector: node degree and 3-D location. The 4-D vectors of all the network nodes of the training data are then grouped in a number of clusters using K-means; where each such cluster is termed as a word. Finally, each subject is represented by a histogram (bag) of such words. The Support Vector Machine (SVM) classifier is used for the detection of the ADHD subjects using their histogram representation. The method is able to achieve 64% classification accuracy.The above simple approach has several shortcomings. First, there is a loss of spatial information while constructing the histogram because it only counts the occurrences of words ignoring the spatial positions. Second, features from the whole brain are used for classification, but some of the brain regions may not contain any useful information and may only increase the feature dimensions and noise of the system. Third, in our study we used only one network feature, the degree of a node which measures the connectivity of the node, while other complex network features may be useful for solving the proposed problem.In order to address the above shortcomings, we hypothesize that only a subset of the nodes of the network possesses important information for the classification of the ADHD subjects. To identify the important nodes of the network we have developed a novel algorithm. The algorithm generates different random subset of nodes each time extracting the features from a subset to compute the feature vector and perform classification. The subsets are then ranked based on the classification accuracy and the occurrences of each node in the top ranked subsets are measured. Our algorithm selects the highly occurring nodes for the final classification. Furthermore, along with the node degree, we employ three more node features: network cycles, the varying distance degree and the edge weight sum. We concatenate the features of the selected nodes in a fixed order to preserve the relative spatial information. Experimental validation suggests that the use of the features from the nodes selected using our algorithm indeed help to improve the classification accuracy. Also, our finding is in concordance with the existing literature as the brain regions identified by our algorithms are independently found by many other studies on the ADHD. We achieved a classification accuracy of 69.59% using this approach. However, since this method represents each voxel as a node of the network which makes the number of nodes of the network several thousands. As a result, the network construction step becomes computationally very expensive. Another limitation of the approach is that the network features, which are computed for each node of the network, captures only the local structures while ignore the global structure of the network.Next, in order to capture the global structure of the networks, we use the Multi-Dimensional Scaling (MDS) technique to project all the subjects from an unknown network-space to a low dimensional space based on their inter-network distance measures. For the purpose of computing distance between two networks, we represent each node by a set of attributes such as the node degree, the average power, the physical location, the neighbor node degrees, and the average powers of the neighbor nodes. The nodes of the two networks are then mapped in such a way that for all pair of nodes, the sum of the attribute distances, which is the inter-network distance, is minimized. To reduce the network computation cost, we enforce that the maximum relevant information is preserved with minimum redundancy. To achieve this, the nodes of the network are constructed with clusters of highly active voxels while the activity levels of the voxels are measured based on the average power of their corresponding fMRI time-series. Our method shows promise as we achieve impressive classification accuracies (73.55%) on the ADHD-200 data set. Our results also reveal that the detection rates are higher when classification is performed separately on the male and female groups of subjects.So far, we have only used the fMRI data for solving the ADHD diagnosis problem. Finally, we investigated the answers of the following questions. Do the structural brain images contain useful information related to the ADHD diagnosis problem? Can the classification accuracy of the automatic diagnosis system be improved combining the information of the structural and functional brain data? Towards that end, we developed a new method to combine the information of structural and functional brain images in a late fusion framework. For structural data we input the gray matter (GM) brain images to a Convolutional Neural Network (CNN). The output of the CNN is a feature vector per subject which is used to train the SVM classifier. For the functional data we compute the average power of each voxel based on its fMRI time series. The average power of the fMRI time series of a voxel measures the activity level of the voxel. We found significant differences in the voxel power distribution patterns of the ADHD and control groups of subjects. The Local binary pattern (LBP) texture feature is used on the voxel power map to capture these differences. We achieved 74.23% accuracy using GM features, 77.30% using LBP features and 79.14% using combined information.In summary this dissertation demonstrated that the structural and functional brain imaging data are useful for the automatic detection of the ADHD subjects as we achieve impressive classification accuracies on the ADHD-200 data set. Our study also helps to identify the brain regions which are useful for ADHD subject classification. These findings can help in understanding the pathophysiology of the problem. Finally, we expect that our approaches will contribute towards the development of a biological measure for the diagnosis of the ADHD subjects.
Show less - Date Issued
- 2014
- Identifier
- CFE0005786, ucf:50060
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005786
- Title
- Learning Collective Behavior in Multi-relational Networks.
- Creator
-
Wang, Xi, Sukthankar, Gita, Tappen, Marshall, Georgiopoulos, Michael, Hu, Haiyan, Anagnostopoulos, Georgios, University of Central Florida
- Abstract / Description
-
With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of...
Show moreWith the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex.The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons.
Show less - Date Issued
- 2014
- Identifier
- CFE0005439, ucf:50376
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005439
- Title
- On Kernel-base Multi-Task Learning.
- Creator
-
Li, Cong, Georgiopoulos, Michael, Anagnostopoulos, Georgios, Tappen, Marshall, Hu, Haiyan, Ni, Liqiang, University of Central Florida
- Abstract / Description
-
Multi-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the...
Show moreMulti-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the opportunity it offers to yield a kernel-based methods and its capability in promoting sparse feature representations.However, certain limitations exist in both theoretical and practical aspects of Regularization-Loss-based MTL. Theoretically, previous research on generalization bounds in connection to MTL Hypothesis Space (HS)s, where data of all tasks are pre-processed by a (partially) common operator, has been limited in two aspects: First, all previous works assumed linearity of the operator, therefore completely excluding kernel-based MTL HSs, for which the operator is potentially non-linear. Secondly, all previous works, rather unnecessarily, assumed that all the task weights to be constrained within norm-balls, whose radii are equal. The requirement of equal radii leads to significant inflexibility of the relevant HSs, which may cause the generalization performance of the corresponding MTL models to deteriorate. Practically, various algorithms have been developed for kernel-based MTL models, due to different characteristics of the formulations. Most of these algorithms are a burden to develop and end up being quite sophisticated, so that practitioners may face a hard task in interpreting and implementing them, especially when multiple models are involved. This is even more so, when Multi-Task Multiple Kernel Learning (MT-MKL) models are considered. This research largely resolves the above limitations. Theoretically, a pair of new kernel-based HSs are proposed: one for single-kernel MTL, and another one for MT-MKL. Unlike previous works, we allow each task weight to be constrained within a norm-ball, whose radius is learned during training. By deriving and analyzing the generalization bounds of these two HSs, we show that, indeed, such a flexibility leads to much tighter generalization bounds, which often results to significantly better generalization performance. Based on this observation, a pair of new models is developed, one for each case: single-kernel MTL, and another one for MT-MKL. From a practical perspective, we propose a general MT-MKL framework that covers most of the prominent MT-MKL approaches, including our new MT-MKL formulation. Then, a general purpose algorithm is developed to solve the framework, which can also be employed for training all other models subsumed by this framework. A series of experiments is conducted to assess the merits of the proposed mode when trained by the new algorithm. Certain properties of our HSs and formulations are demonstrated, and the advantage of our model in terms of classification accuracy is shown via these experiments.
Show less - Date Issued
- 2014
- Identifier
- CFE0005517, ucf:50321
- Format
- Document (PDF)
- PURL
- http://purl.flvc.org/ucf/fd/CFE0005517