Date of Award

8-2008

Degree Type

Dissertation

Degree Name

Ph.D.

Degree Program

Engineering and Applied Science

Department

Computer Science

Major Professor

Summa, Christopher

Second Advisor

Winters-Hilt, Stephen

Third Advisor

Fu, Bin

Fourth Advisor

Chen, Huimin

Fifth Advisor

Zhu, Dongxiao

Abstract

Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars" to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars" in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented.

Rights

The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.

Recommended Citation

Zhao, Zhiyu, "Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison" (2008). University of New Orleans Theses and Dissertations. 851.
https://scholarworks.uno.edu/td/851

Download

COinS

ScholarWorks@UNO

University of New Orleans Theses and Dissertations

Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison

Date of Award

Degree Type

Degree Name

Degree Program

Department

Major Professor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Rights

Recommended Citation

Search

Browse

Author Corner

Links

ScholarWorks@UNO

University of New Orleans Theses and Dissertations

Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison

Author

Date of Award

Degree Type

Degree Name

Degree Program

Department

Major Professor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Rights

Recommended Citation

Share

Search

Browse

Author Corner

Links