* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Fast Search Protein Structure Prediction Algorithm for Almost Perfect
Survey
Document related concepts
Genetic code wikipedia , lookup
Paracrine signalling wikipedia , lookup
Biochemistry wikipedia , lookup
Gene expression wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Expression vector wikipedia , lookup
Point mutation wikipedia , lookup
Metalloprotein wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Structural alignment wikipedia , lookup
Protein purification wikipedia , lookup
Proteolysis wikipedia , lookup
Transcript
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder Secondary Supervisor: Dr. Margaret Hamilton Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 1 Introduction Bio-Informatics What is Bio-Informatics? Bio-Informatics is the science of developing computer databases and algorithms to facilitate biological research especially in the area of genomic. Genomic is the study of genes and its functions. Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 2 Background - Protein Structure a How can we find the Structure of a protein ? r n c • X-ray Crystallography • NMR Spectroscopy a d Amino acid r e a Protein Structure k Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches Phi Psi 3 Where does Computer Science come into it? Limitations of traditional lab-work •Expensive Cost involved in finding the structure through these method is expensive •Time Consuming Takes 6 to 12 months to predict the structure of a single protein. REASON: Some proteins don’t crystallise Some don’t give good diffraction patterns All proteins are fragile and difficult to handle. Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 4 Methods Available There are many ways by which this problem is being tackled. These methods are basically classified into two groups: • ab initio • Homology modelling What is Homology modelling ? Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 5 What is homology modelling? Homology modeling works on the principle that although each protein adopts a unique structure, there are only ~2,000 common folds between the various super families identified thus far. If two protein sequences are aligned and their percentage similarity is above the ‘twilight zone’, or 20% we can conclude that the sequences are homologous, or share a common ancestry, below this zone it is not possible to say whether the identical amino acid residues are in fact evolutionarily linked or have arisen by chance. Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 6 What is Protein Structure Prediction? In its most general form - It is the prediction of the relative position of each amino acid in the protein structure with the knowledge of the structural details of other known proteins. Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 7 Why predict protein structure? • The sequence structure gap – 750 000 known sequences, 17 000 known structures • Structural knowledge brings understanding of function and mechanism of action • Can help in prediction of function • Predicted structures can be used in structure based drug design • It can help us understand the effects of mutations on structure or function • It is a very interesting scientific problem – still unsolved in its most general form after more than 20 years of effort Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 8 Protein Structure Prediction Algorithm Protein Database n f s b c a r . . . . . window Protein sequence for which the structure is unknown a r n d c q e g h i l km n f s s d e g h i l n f s e a r l k s p q g a n h e . . . . . . . . . . . Window size =3. Can be implemented with window size of 5,7,9. With window size of 9, we look for almost perfect matches as we wont get a perfect match with the database we have. Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 9 Algorithm – continued.. Number of Occurrences Phi graph Psi graph Number of Occurrences Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 10 Limitations of this algorithm Time Consuming Time taken to predict the 2 hr PC time structure of a protein Time taken to predict the structure 20,000 protein Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 2 x 20,000 = 40,000 hrs PC time 11 Why does it take time? Each sub sequence of the unknown protein is compared with all the sub sequences of the proteins in the database. With a window size of 9, the number of sub strings in the database will be around 2 million. So, there will be 2 million comparisons for each sub sequence in the unknown protein. “Unknown protein” here means the proteins whose sequence is knows but the structure is not known. Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 12 Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches •Arrange the sub sequences with a hamming distance of one between each sub sequences. What is hamming distance? The number of disagreeing bits between two binary vectors. Used as measure of dissimilarity. Eg. 1000011 1000001 These two binary numbers differ by one bit. Hamming distance of one here means that the each sub sequence differ from the one next to that by just one amino acid. Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 13 Continued… • Maintain a table which stores the hope index value for a mismatch. For example Row number Sub Sequence Jump to row number 1023 111110000 1027 1024 111110001 1025 111110002 1026 111110003 1027 111110013 1028 111110012 1029 111110011 1030 111110010 1031 111110020 1035 . . . 1031 Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches 14