Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Therapeutic gene modulation wikipedia , lookup
Metagenomics wikipedia , lookup
Protein moonlighting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Sequence alignment wikipedia , lookup
Point mutation wikipedia , lookup
BioInformatics Applying principles of computer science in a biological context Outline Biological Background Information Problem Description My Project Previous Work Senior Thesis Biological Data Sets Raw DNA sequences Macromolecular structures Genomes Protein Sequences What is a Protein Sequence? A string of amino acids, each represented by a single letter There are 20 different amino acids Typical proteins are about 300 amino acids long …ILVKMUTANKVKMU… Examples of Amino Acids Importance of Protein Sequences Compare two or more sequences Determine similarities in their functions Multiple Alignment Serves as input for my analysis Web-based programs available Maximizes areas of similarity Multiple Alignment Example Shaded areas show regions of exact match. A dash is placed in the smaller protein sequence to achieve the alignment. Redundancies in each column are then removed. Evaluating Usefulness of a Primer Similar does not always mean useful Why? Different ways of creating amino acids Amino acids coded by nucleotide triplets A, T, C, G Triplet = Codon Degeneracy Example Current Methods Client: Biology Professor Steven Horton Manual search of primers Manual calculation of degeneracy Requirements Automate the task of finding primers Automate degeneracy calculation Record and organize results Analyze data to make predictions Pattern Matching Data mining Summer Research Analyzed solutions made by Software Engineering class in Spring 2003 Combined the good design features from each project Made a prototype in Java Senior Thesis Fall Term Finished the prototype Multiple window design Made algorithm more efficient using dynamic programming The Prototype Primer List Window Inspection Window Senior Thesis Winter Term Incorporated a system to record to analyze results produced from finding primers Utilized data mining tools Data Mining: Association Rules Have the form LHSRHS Interpretation: If every item in LHS occurs, then it is likely that all of the items in RHS will also occur Example: LHS = protein sequence A contains primers 1, 2 & 3 RHS = protein sequence A contains primer 4 & 5 Application: Find Association Rules based on Horton’s data collected about primers Protein Database The End