Download THIN FILM STRUCTURES

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Biochemistry wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Genetic code wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein wikipedia , lookup

List of types of proteins wikipedia , lookup

Gene expression wikipedia , lookup

Protein moonlighting wikipedia , lookup

Western blot wikipedia , lookup

Non-coding DNA wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Protein design wikipedia , lookup

Molecular evolution wikipedia , lookup

Protein adsorption wikipedia , lookup

Protein domain wikipedia , lookup

Proteolysis wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Point mutation wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Homology modeling wikipedia , lookup

Structural alignment wikipedia , lookup

Transcript
School of Electrical, Computer and Energy Engineering
PhD Final Oral Defense
Waveform Mapping and Time-Frequency Processing
of Biological Sequences and Structures
by
Lakshminarayan Ravichandran
July 13, 2011
8:30am
GWC 409
Committee:
Dr. Antonia Papandreou-Suppappola (Co-Chair)
Dr. Andreas Spanias (Co-Chair)
Dr. Chaitali Chakrabarti
Dr. Cihan Tepedelenlioglu
Dr. Zoe Lacroix
Abstract
There is an enormous amount of genomic and proteomic data available for use in
public databases. The genomic data which is in the form of deoxyribonucleic acid (DNA)
and the proteomic data which is in the form of amino acids, play a vital role in the
function of every living cell. Hence, there is a need to understand the organization and
the functionality of the DNA and protein regions. In order to address this issue, various
methods have been proposed from diverse disciplines such as biology, chemistry,
physics, computer and electrical engineering. As this genomic and proteomic information
is in discrete time, it can be represented in the form of numerical data that can be
analyzed so that the results obtained are beneficial to mankind.
Signal processing techniques have been used to efficiently analyze DNA and
protein sequences, and the results obtained can be attributed to key biological properties.
For example, signal processing techniques were applied to the problem of sequence
alignment that compares and classifies regions of similarity in the sequences based on
their composition. Current state-of-the-art approaches for biological sequence querying
and alignment require pre-processing and lack robustness to repetitions in the sequence.
In addition, these approaches do not provide much support for efficiently querying subsequences, a process that is essential for tracking localized database matches.
In this manuscript, we first propose a query-based alignment method for biological
sequences that first maps sequences to time-domain waveforms before processing the
waveforms for alignment in the time-frequency plane. The mapping uses waveforms,
such as Gaussian functions, with unique sequence representations in the time-frequency
plane. The proposed alignment method employs a robust querying algorithm that utilizes
a time-frequency signal expansion whose basis function is matched to the basic
waveform in the mapped sequences. The resulting WAVEQuery approach was
demonstrated for both deoxyribonucleic acid (DNA) and protein sequences using the
matching pursuit decomposition as the signal basis expansion. We specifically evaluated
the alignment localization of WAVEQuery over repetitive database segments, and we
demonstrated its operation in real-time without pre-processing. We also demonstrated
that WAVEQuery significantly outperformed the biological sequence alignment method
BLAST for queries with repetitive segments for DNA sequences. A generalized version
of the WAVEQuery approach with the metaplectic transform is also described for protein
sequence structure prediction.
Similarity in a protein’s structure with another protein’s structure implies that the
two structures may have common functionalities. There is need to look beyond
comparing the primary structure (amino acid sequences) and look into comparing or
aligning the protein secondary and tertiary structures which occur in the three-
dimensional (3-D) space. This is done after considering the conformations in the 3-D
space due to the degrees of freedom possessed by these structures.
We next present a novel directionality-based windowed chirp waveform
representation for the protein 3-D structure and use this representation to compare protein
structures using a matched filter approach. The highlight of the approach is that we
embed the directionality in the waveform representation with the waveform being highly
localized in the 3-D place, a parameter hydrophobicity that relates the sequence with the
structure, and also provide with a linearly separable representation. This helps track
similarities over segments of the structure locally, hence enabling it to classify structures
in distantly related proteins, which have partial structural similarities, and not over the
entire length of the protein structure. This approach has been tested for pairwise
alignment over entire length of structures, alignment over multiple structures to form a
phylogenetic tree, aligning structural segments locally, and also in performing basic
classification over the protein structural classes using directional descriptors for the
protein structure.