* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Poster Link
Survey
Document related concepts
Protein domain wikipedia , lookup
Protein folding wikipedia , lookup
Protein purification wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Western blot wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Structural alignment wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
List of types of proteins wikipedia , lookup
Homology modeling wikipedia , lookup
Transcript
BioInformatics Abstract What is a Protein Sequence? In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino acids from many proteins. There are a total of 20 amino acids in existence and proteins often consist of 300 or more amino acids. A “multiple alignment” is performed on a collection of sequences to maximize the areas where the amino acids are similar across all sequences. Online websites presently are available to accomplish the task. Once the multiple alignment is complete, a tedious process begins of searching for contiguous subsequences of the aligned group of protein sequences that may be useful in determining properties about the proteins’ functions. Subsequences that are selected for further analysis are called “primers.” The primer search process is often done by hand and can take hours for small sequence lengths. This project entails a Java program that automates the primer search process and a database organizing results obtained after primers are generated. The software allows the user to examine multiple primers at once and to adjust primer lengths. Once the primers are generated, lab tests are performed on the primers and the results are entered into a database. The database can be queried to find results that might be useful to a biologist. A string of amino acids, each represented by a single letter There are 20 different amino acids Typical proteins are about 300 amino acids long EXAMPLE: …ILVKMUTANKVKMU… Multiple Alignment Example Shaded areas show regions of exact match. A dash is placed in the smaller protein sequence to achieve the alignment. Redundancies in each column are then removed. Degeneracy Example Inspection Window The codons are listed for each corresponding amino acid to determine how many different ways each amino acid can be produced from DNA. The total degeneracy is the product of each amino acid’s value. The higher this number is, the less likely we know where the sequence originated from, and the less useful it is in any experiments. This window alllows the user to manipulate one particular primer chosen from a multiple alignment. The control buttons located at the bottom allow the length and position of the primer to be changed with degeneracy updated automatically. Biological Description of the Gene Database of Primer Results Name of Gene Nucleotide Sequence for Gene Information for the Experiment By clicking on Oligos, you can choose which Oligos occurred in the reaction. Amino Acid Sequence Oligos Contained in the Gene By clicking on Observations, you can record results about each reaction. Reactions for the Experiment Data Mining We want to find Association Rules based on data collected about primers to make predictions about which ones to use Association Rules have the form LHSRHS Interpretation: If every item in LHS occurs, then it is likely that all of the items in RHS will also occur Example: LHS = protein sequence A contains primers 1, 2 & 3 RHS = protein sequence A contains primer 4 & 5 Support Data Mining: Support & Confidence How often do LHS & RHS occur together? Confidence Whenever LHS occurs, how often does RHS occur as well? Scope Data is small compared to online databanks Looking to larger sources to increase the support of any predictions made will help in the future