Download Large-scale Protein Flexibility Analysis of Single Nucleotide

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SR protein wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

List of types of proteins wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein folding wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Protein domain wikipedia , lookup

Protein wikipedia , lookup

Cyclol wikipedia , lookup

Protein moonlighting wikipedia , lookup

Homology modeling wikipedia , lookup

JADE1 wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein structure prediction wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Protein Prediction II Exercise
T. Hamp & L. Richter
2
Exercise – Project Layout
General remarks – recap: Report 60pts, Exam 40 pts, weekly
presentations of each group, one bad presentation allowed, groups of
3-4 students
Contact & Questions: [email protected] only!
The exercise is taken from the CAFA competition
Prediction of HPO terms
HPO: Human phenotype ontology
T. Hamp & L. Richter
3
Terms – Definitions and Explanations
Amino acids (aa): Building blocks for proteins, 20 different aa are
found in proteins
Protein sequence: String of characters representing a sequence of
amino acids (string from a 20 letter alphabet)
The protein sequence defines the protein structure and the protein
function (within some limits)
Proteins sequences are stored in large publicly available repositories
One of the most well known repositories is UniProt
(http://www.uniprot.org/) and its section Swiss-Prot
Besides the sequence these databases hold additional information
about the protein, too
T. Hamp & L. Richter
4
Ontology (in information science)
Ontology: An ontology represents knowledge as a set of concepts
within a domain, using a shard vocabulary to denote types, properties
and interrelationships of those concepts
Human Phenotype ontology (HPO): Set of concepts describing
human appearing (shape, health, a.s.f.)
HPO concepts are hierarchically ordered, i.e. there is a “is-a” relation
ship.
they are arranged in a tree-like fashion
T. Hamp & L. Richter
5
Our competition
Proteins are annotated (described) with experimentally determined
information
As time goes by: Proteins are associated with information about
experimentally confirmed effects on the human phenotype
The associated term are taken form the Human Phenotype ontology
Experimental determination is slow and expensive
=> we try to predict associated HPO terms for the yet un-annotated
T. Hamp & L. Richter
6
More formal steps
Find a function that assigns a set of HPO terms T to a sequence s so
that the number of false assignment is minimal and the number of
true assignments is maximal
Remember: The true evaluation is done after submission when so far
not annotated sequences get experimentally determined annotations
T. Hamp & L. Richter
7
Tasks
Download files from www.rostlab.org/~richter/pp2_files.tgz
Get familiar with the provided files
Especially the column names (look for at Uniprot and HPO)
Read:
http://biofunctionprediction.org/sites/default/files/IntroductionCAFA_pe
dja.pdf
T. Hamp & L. Richter