Download Poster Link

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein domain wikipedia , lookup

Protein folding wikipedia , lookup

Protein purification wikipedia , lookup

Circular dichroism wikipedia , lookup

Proteomics wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Structural alignment wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein wikipedia , lookup

List of types of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Alpha helix wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
BioInformatics
Abstract
What is a Protein Sequence?
In order to help predict the way proteins will
act in an organism, biologists cross-examine sequences of
amino acids from many proteins. There are a total of 20
amino acids in existence and proteins often consist of 300
or more amino acids. A “multiple alignment” is performed
on a collection of sequences to maximize the areas where
the amino acids are similar across all sequences. Online
websites presently are available to accomplish the task.
Once the multiple alignment is complete, a
tedious process begins of searching for contiguous
subsequences of the aligned group of protein sequences
that may be useful in determining properties about the
proteins’ functions. Subsequences that are selected for
further analysis are called “primers.” The primer search
process is often done by hand and can take hours for
small sequence lengths.
This project entails a Java program that
automates the primer search process and a database
organizing results obtained after primers are generated.
The software allows the user to examine multiple primers
at once and to adjust primer lengths. Once the primers are
generated, lab tests are performed on the primers and the
results are entered into a database. The database can be
queried to find results that might be useful to a biologist.
A string of amino acids, each represented by a single letter
There are 20 different amino acids
Typical proteins are about 300 amino acids long
EXAMPLE:
…ILVKMUTANKVKMU…
Multiple Alignment
Example
Shaded areas show regions of exact match.
A dash is placed in the smaller protein
sequence to achieve the alignment.
Redundancies in each column are then
removed.
Degeneracy Example
Inspection Window
The codons are listed for each
corresponding amino acid to
determine how many different
ways each amino acid can be
produced from DNA.
The total degeneracy is the
product of each amino acid’s
value. The higher this number is,
the less likely we know where the
sequence originated from, and the
less useful it is in any
experiments.
This window alllows
the user to
manipulate one
particular primer
chosen from a
multiple alignment.
The control buttons located at the bottom allow the length
and position of the primer to be changed with degeneracy
updated automatically.
Biological Description of the Gene
Database of Primer Results
Name of Gene
Nucleotide Sequence for Gene
Information for the Experiment
By clicking on Oligos, you can
choose which Oligos occurred
in the reaction.
Amino Acid Sequence
Oligos Contained in the Gene
By clicking on Observations,
you can record results about
each reaction.
Reactions for the Experiment
Data Mining
We want to find Association Rules based on data collected about
primers to make predictions about which ones to use
Association Rules have the form LHSRHS
Interpretation: If every item in LHS occurs, then it is likely that all of
the items in RHS will also occur
Example:
LHS = protein sequence A contains primers 1, 2 & 3
RHS = protein sequence A contains primer 4 & 5
Support
Data Mining:
Support & Confidence
How often do LHS & RHS occur together?
Confidence
Whenever LHS occurs, how often does RHS occur
as well?
Scope
Data is small compared to online databanks
Looking to larger sources to increase the support
of any predictions made will help in the future