* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download THIN FILM STRUCTURES
Magnesium transporter wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Biochemistry wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genetic code wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
List of types of proteins wikipedia , lookup
Gene expression wikipedia , lookup
Protein moonlighting wikipedia , lookup
Western blot wikipedia , lookup
Non-coding DNA wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Protein design wikipedia , lookup
Molecular evolution wikipedia , lookup
Protein adsorption wikipedia , lookup
Protein domain wikipedia , lookup
Proteolysis wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Point mutation wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
School of Electrical, Computer and Energy Engineering PhD Final Oral Defense Waveform Mapping and Time-Frequency Processing of Biological Sequences and Structures by Lakshminarayan Ravichandran July 13, 2011 8:30am GWC 409 Committee: Dr. Antonia Papandreou-Suppappola (Co-Chair) Dr. Andreas Spanias (Co-Chair) Dr. Chaitali Chakrabarti Dr. Cihan Tepedelenlioglu Dr. Zoe Lacroix Abstract There is an enormous amount of genomic and proteomic data available for use in public databases. The genomic data which is in the form of deoxyribonucleic acid (DNA) and the proteomic data which is in the form of amino acids, play a vital role in the function of every living cell. Hence, there is a need to understand the organization and the functionality of the DNA and protein regions. In order to address this issue, various methods have been proposed from diverse disciplines such as biology, chemistry, physics, computer and electrical engineering. As this genomic and proteomic information is in discrete time, it can be represented in the form of numerical data that can be analyzed so that the results obtained are beneficial to mankind. Signal processing techniques have been used to efficiently analyze DNA and protein sequences, and the results obtained can be attributed to key biological properties. For example, signal processing techniques were applied to the problem of sequence alignment that compares and classifies regions of similarity in the sequences based on their composition. Current state-of-the-art approaches for biological sequence querying and alignment require pre-processing and lack robustness to repetitions in the sequence. In addition, these approaches do not provide much support for efficiently querying subsequences, a process that is essential for tracking localized database matches. In this manuscript, we first propose a query-based alignment method for biological sequences that first maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane. The mapping uses waveforms, such as Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach was demonstrated for both deoxyribonucleic acid (DNA) and protein sequences using the matching pursuit decomposition as the signal basis expansion. We specifically evaluated the alignment localization of WAVEQuery over repetitive database segments, and we demonstrated its operation in real-time without pre-processing. We also demonstrated that WAVEQuery significantly outperformed the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction. Similarity in a protein’s structure with another protein’s structure implies that the two structures may have common functionalities. There is need to look beyond comparing the primary structure (amino acid sequences) and look into comparing or aligning the protein secondary and tertiary structures which occur in the three- dimensional (3-D) space. This is done after considering the conformations in the 3-D space due to the degrees of freedom possessed by these structures. We next present a novel directionality-based windowed chirp waveform representation for the protein 3-D structure and use this representation to compare protein structures using a matched filter approach. The highlight of the approach is that we embed the directionality in the waveform representation with the waveform being highly localized in the 3-D place, a parameter hydrophobicity that relates the sequence with the structure, and also provide with a linearly separable representation. This helps track similarities over segments of the structure locally, hence enabling it to classify structures in distantly related proteins, which have partial structural similarities, and not over the entire length of the protein structure. This approach has been tested for pairwise alignment over entire length of structures, alignment over multiple structures to form a phylogenetic tree, aligning structural segments locally, and also in performing basic classification over the protein structural classes using directional descriptors for the protein structure.