You are here

NEW COMPUTATIONAL APPROACHES FOR MULTIPLE RNA ALIGNMENT AND RNA SEARCH

Download pdf | Full Screen View

Date Issued:
2009
Abstract/Description:
In this thesis we explore the the theory and history behind RNA alignment. Normal sequence alignments as studied by computer scientists can be completed in $O(n^2)$ time in the naive case. The process involves taking two input sequences and finding the list of edits that can transform one sequence into the other. This process is applied to biology in many forms, such as the creation of multiple alignments and the search of genomic sequences. When you take into account the RNA sequence structure the problem becomes even harder. Multiple RNA structure alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Existing tools for multiple RNA alignments first generate pair-wise RNA structure alignments and then build the multiple alignment using only the sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a multiple RNA structure alignment. PMFastR also has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. Specifically, we reduce the memory consumption to $\sim O(band^2*m)$ where $band$ is the banding size. Other solutions are $\sim O(n^2*m)$ where $n$ and $m$ are the lengths of the target and query respectively. The algorithm also provides a method to utilize a multi-core environment. We present results on benchmark data sets from BRAliBase, which shows PMFastR outperforms other state-of-the-art programs. Furthermore, we regenerate 607 Rfam seed alignments and show that our automated process creates similar multiple alignments to the manually-curated Rfam seed alignments. While these methods can also be applied directly to genome sequence search, the abundance of new multiple species genome alignments presents a new area for exploration. Many multiple alignments of whole genomes are available and these alignments keep growing in size. These alignments can provide more information to the searcher than just a single sequence. Using the methodology from sequence-structure alignment we developed AlnAlign, which searches an entire genome alignment using RNA sequence structure. While programs have been readily available to align alignments, this is the first to our knowledge that is specifically designed for RNA sequences. This algorithm is presented only in theory and is yet to be tested.
Title: NEW COMPUTATIONAL APPROACHES FOR MULTIPLE RNA ALIGNMENT AND RNA SEARCH.
25 views
10 downloads
Name(s): DeBlasio, Daniel, Author
Zhang, Shaojie, Committee Chair
University of Central Florida, Degree Grantor
Type of Resource: text
Date Issued: 2009
Publisher: University of Central Florida
Language(s): English
Abstract/Description: In this thesis we explore the the theory and history behind RNA alignment. Normal sequence alignments as studied by computer scientists can be completed in $O(n^2)$ time in the naive case. The process involves taking two input sequences and finding the list of edits that can transform one sequence into the other. This process is applied to biology in many forms, such as the creation of multiple alignments and the search of genomic sequences. When you take into account the RNA sequence structure the problem becomes even harder. Multiple RNA structure alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Existing tools for multiple RNA alignments first generate pair-wise RNA structure alignments and then build the multiple alignment using only the sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a multiple RNA structure alignment. PMFastR also has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. Specifically, we reduce the memory consumption to $\sim O(band^2*m)$ where $band$ is the banding size. Other solutions are $\sim O(n^2*m)$ where $n$ and $m$ are the lengths of the target and query respectively. The algorithm also provides a method to utilize a multi-core environment. We present results on benchmark data sets from BRAliBase, which shows PMFastR outperforms other state-of-the-art programs. Furthermore, we regenerate 607 Rfam seed alignments and show that our automated process creates similar multiple alignments to the manually-curated Rfam seed alignments. While these methods can also be applied directly to genome sequence search, the abundance of new multiple species genome alignments presents a new area for exploration. Many multiple alignments of whole genomes are available and these alignments keep growing in size. These alignments can provide more information to the searcher than just a single sequence. Using the methodology from sequence-structure alignment we developed AlnAlign, which searches an entire genome alignment using RNA sequence structure. While programs have been readily available to align alignments, this is the first to our knowledge that is specifically designed for RNA sequences. This algorithm is presented only in theory and is yet to be tested.
Identifier: CFE0002736 (IID), ucf:48166 (fedora)
Note(s): 2009-08-01
M.S.
Engineering and Computer Science, School of Electrical Engineering and Computer Science
Masters
This record was generated from author submitted information.
Subject(s): Bioinformatics
Compuational Biology
Computer Science
RNA
Alignment
Search
Memory Consumption
PMFastR
Persistent Link to This Record: http://purl.flvc.org/ucf/fd/CFE0002736
Restrictions on Access: campus 2010-07-01
Host Institution: UCF

In Collections