* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Large-scale Protein Flexibility Analysis of Single Nucleotide
Survey
Document related concepts
G protein–coupled receptor wikipedia , lookup
List of types of proteins wikipedia , lookup
Magnesium transporter wikipedia , lookup
Protein folding wikipedia , lookup
Protein phosphorylation wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Protein domain wikipedia , lookup
Protein moonlighting wikipedia , lookup
Homology modeling wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Transcript
Protein Prediction II Exercise T. Hamp & L. Richter 2 Exercise – Project Layout General remarks – recap: Report 60pts, Exam 40 pts, weekly presentations of each group, one bad presentation allowed, groups of 3-4 students Contact & Questions: [email protected] only! The exercise is taken from the CAFA competition Prediction of HPO terms HPO: Human phenotype ontology T. Hamp & L. Richter 3 Terms – Definitions and Explanations Amino acids (aa): Building blocks for proteins, 20 different aa are found in proteins Protein sequence: String of characters representing a sequence of amino acids (string from a 20 letter alphabet) The protein sequence defines the protein structure and the protein function (within some limits) Proteins sequences are stored in large publicly available repositories One of the most well known repositories is UniProt (http://www.uniprot.org/) and its section Swiss-Prot Besides the sequence these databases hold additional information about the protein, too T. Hamp & L. Richter 4 Ontology (in information science) Ontology: An ontology represents knowledge as a set of concepts within a domain, using a shard vocabulary to denote types, properties and interrelationships of those concepts Human Phenotype ontology (HPO): Set of concepts describing human appearing (shape, health, a.s.f.) HPO concepts are hierarchically ordered, i.e. there is a “is-a” relation ship. they are arranged in a tree-like fashion T. Hamp & L. Richter 5 Our competition Proteins are annotated (described) with experimentally determined information As time goes by: Proteins are associated with information about experimentally confirmed effects on the human phenotype The associated term are taken form the Human Phenotype ontology Experimental determination is slow and expensive => we try to predict associated HPO terms for the yet un-annotated T. Hamp & L. Richter 6 More formal steps Find a function that assigns a set of HPO terms T to a sequence s so that the number of false assignment is minimal and the number of true assignments is maximal Remember: The true evaluation is done after submission when so far not annotated sequences get experimentally determined annotations T. Hamp & L. Richter 7 Tasks Download files from www.rostlab.org/~richter/pp2_files.tgz Get familiar with the provided files Especially the column names (look for at Uniprot and HPO) Read: http://biofunctionprediction.org/sites/default/files/IntroductionCAFA_pe dja.pdf T. Hamp & L. Richter