You are here
Efficient String Graph Construction Algorithm
- Date Issued:
- 2019
- Abstract/Description:
- In the field of genome assembly research where assemblers are dominated by de Bruijn graph-based approaches, string graph-based assembly approach is getting more attention because of its ability to losslessly retain information from sequence data. Despite the advantages provided by a string graph in repeat detection and in maintaining read coherence, the high computational cost for constructing a string graph hinders its usability for genome assembly. Even though different algorithms have been proposed over the last decade for string graph construction, efficiency is still a challenge due to the demand for processing a large amount of sequence data generated by NGS technologies. Therefore, in this thesis, we provide a novel, linear time and alphabet-size-independent algorithm SOF which uses the property of irreducible edges and transitive edges to efficiently construct string graph from an overlap graph. Experimental results show that SOF is at least 2 times faster than the string graph construction algorithm provided in SGA, one of the most popular string graph-based assembler, while maintaining almost the same memory footprint as SGA. Moreover, the availability of SOF as a subprogram in the SGA assembly pipeline will give user facilities to access the preprocessing and postprocessing steps for genome assembly provided in SGA.
Title: | Efficient String Graph Construction Algorithm. |
50 views
21 downloads |
---|---|---|
Name(s): |
Morshed, S.M. Iqbal, Author Yooseph, Shibu, Committee Chair Zhang, Shaojie, Committee Member Valliyil Thankachan, Sharma, Committee Member University of Central Florida, Degree Grantor |
|
Type of Resource: | text | |
Date Issued: | 2019 | |
Publisher: | University of Central Florida | |
Language(s): | English | |
Abstract/Description: | In the field of genome assembly research where assemblers are dominated by de Bruijn graph-based approaches, string graph-based assembly approach is getting more attention because of its ability to losslessly retain information from sequence data. Despite the advantages provided by a string graph in repeat detection and in maintaining read coherence, the high computational cost for constructing a string graph hinders its usability for genome assembly. Even though different algorithms have been proposed over the last decade for string graph construction, efficiency is still a challenge due to the demand for processing a large amount of sequence data generated by NGS technologies. Therefore, in this thesis, we provide a novel, linear time and alphabet-size-independent algorithm SOF which uses the property of irreducible edges and transitive edges to efficiently construct string graph from an overlap graph. Experimental results show that SOF is at least 2 times faster than the string graph construction algorithm provided in SGA, one of the most popular string graph-based assembler, while maintaining almost the same memory footprint as SGA. Moreover, the availability of SOF as a subprogram in the SGA assembly pipeline will give user facilities to access the preprocessing and postprocessing steps for genome assembly provided in SGA. | |
Identifier: | CFE0007504 (IID), ucf:52635 (fedora) | |
Note(s): |
2019-05-01 M.S. Engineering and Computer Science, Computer Science Masters This record was generated from author submitted information. |
|
Subject(s): | bioinformatics -- string graph -- sequence assembly | |
Persistent Link to This Record: | http://purl.flvc.org/ucf/fd/CFE0007504 | |
Restrictions on Access: | public 2019-05-15 | |
Host Institution: | UCF |