* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Diapositiva 1
Endogenous retrovirus wikipedia , lookup
Signal transduction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Expression vector wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Gene expression wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Point mutation wikipedia , lookup
Protein purification wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genetic code wikipedia , lookup
Structural alignment wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Biosynthesis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The “omic” era Genome Sequencing Projects: Archaea : 74 species In Progress:52 Bacteria: 973 species In Progress: 2266 species Eukaryotic: Complete-23 Draft Assembly–318 In Progress-359 http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html Update: January 2010 The Data Bases of Biological Sequences and Structures GenBank: >BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus. MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSG DLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDE SKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYH WPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDE YSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGI KSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITR GNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVS LAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPY YLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNT KRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH NR(*): 108,431,692 sequences 106,533,156,756 nucleotides 10,381,779 sequences 3,542,056,219 residues 35,5 HGE! SwissProt: PDB: (*) CDS translations+PDB+SwissProt+PIR+PRF 514,212 sequences 180,900,945 residues 60,654 structures membrane proteins <2% Update: January 2009 (about 30,000 in the human genome) …with different effects depending on variability Genes in DNA... >protein kinase acctgttgatggcgacagggactgtatgctgatct atgctgatgcatgcatgctgactactgatgtgggg gctattgacttgatgtctatc.... Over 20 millions of single mutations are known in genes …code for proteins... …proteins correspond to functions... From Genotype to Phenotype From 5000 to 10000 proteins per tissue …when they are expressed Proteins interact ….in methabolic pathways STRING 8—a global view on proteins and their functional interactions in 630 organisms- Jensen et al., 2009, Nucleic Acids Research, Vol 37. The Human Interactome in STRING 22,937 proteins and 1,482,533 interactions http://string.embl.de One problem of the “omic era”: Protein functional annotation The Protein Data Bank http://www.rcsb.org/pdb/home/home.do No of Proteins with known structure: 57529 SCOP: Structural Classification of Proteins Domains are hierarchically classified: - class - fold: proteins with secondary structures in same arrangement with the same topological connections - superfamily: structures and functional features suggest a common evolutionary origin - family: proteins with identities ≥30%; with identities <30% but with similar structures and functions From the Protein Sequence to the Structure and Function space Lesk A., 2004 100% •Sequence comparison PDB New Folds •Fold recognition •Machine-learning aided alignment •Threading •Ab initio and de novo modelling •Machine-learning prediction of structural features 0% Sequence Identity (%) 30% From the Protein Sequence to the Structure space From the Protein Sequence to the Structure and Function space What is protein function? What is a function? For enzymes: function can be defined on the basis of the catalysed molecular reaction. e.g. aspartic aminotransferase (AST) In biochemistry, a transaminase or an aminotransferase is an enzyme that catalyzes a type of reaction between an amino acid and an α-keto acid. Specifically, this reaction (transamination) involves removing the amino group from the amino acid, leaving behind an α-keto acid, and transferring it to the reactant α-keto acid and converting it into an amino acid. The enzymes are important in the production of various amino acids, and measuring the concentrations of various transaminases in the blood is important in the diagnosing and tracking many diseases. Transaminases require the coenzyme pyridoxal-phosphate, which is converted into pyridoxamine in the first phase of the reaction, when an amino acid is converted into a keto acid. Enzyme-bound pyridoxamine in turn reacts with pyruvate, oxaloacetate, or alphaketoglutarate, giving alanine, aspartic acid, or glutamic acid, respectively. The presence of elevated transaminases can be an indicator of liver damage. Enzyme Commission (E.C.) classification A hierarchical classification for enzymes EC 2.6 Transferring nitrogenous groups EC 2.6.1Transaminases EC 2.6.1.1 Aspartate transaminase Other name(s): glutamic-oxaloacetic transaminase; glutamic-aspartic transaminase; transaminase A; AAT; AspT; 2oxoglutarate-glutamate aminotransferase; aspartate α-ketoglutarate transaminase; aspartate aminotransferase; aspartate-2-oxoglutarate transaminase; aspartic acid aminotransferase; aspartic aminotransferase; aspartyl aminotransferase; AST; glutamate-oxalacetate aminotransferase; glutamate-oxalate transaminase; glutamic-aspartic aminotransferase; glutamic-oxalacetic transaminase; glutamic oxalic transaminase; GOT (enzyme); L-aspartate transaminase; L-aspartate-α-ketoglutarate transaminase; L-aspartate-2-ketoglutarate aminotransferase; L-aspartate2-oxoglutarate aminotransferase; L-aspartate-2-oxoglutarate-transaminase; L-aspartic aminotransferase; oxaloacetate-aspartate aminotransferase; oxaloacetate transferase; aspartate:2-oxoglutarate aminotransferase; glutamate oxaloacetate transaminase Systematic name: L-aspartate:2-oxoglutarate aminotransferase Problems: Isoforms e.g How to differentiate the function of the cytoplasmic aspartate amintransferase from that of mitochondrial isoform? Non enzymatic proteins GO function vocabulary: http://www.geneontology.org/ The Ontologies • Cellular component • Biological process • Molecular function Gene Ontology classification: The human cytoplasmic aspartate transaminase GO:0004069 GO:0005829 GO:0006533 One BIG problem of the “omic era”: Protein functional annotation Functional annotation in silico by homology search ADH1_SULSO ADH_CLOBE ADH_THEBR ADH1_SOLTU ADH2_LYCES ADH1_ASPFL ----------MRAVRLVEIGKP--LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVHMRQGRFGNLRIVE ----------MKGFAMLGINKLG---WIEKERPVAGSYDAIVRPLAVSPCTSDIHTVFEGA----------------MKGFAMLSIGKVG---WIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGA------MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG------MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG----------MSIPEMQWAQVAEQKGGP--LIYKQIPVPKPGPDEILVKVRYSGVCHTDLHALKGDW------- Sequence comparison is performed with alignment programs Sequence identity 40 % Similar structure and function (??) Methods for similarity searches: BLAST, Psi-BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) sequence Altschul et al., (1990) J Mol Biol 215:403-410 Altschul et al., (1998) Nucleic Acids Res. 25:3389-3402 Pfam (http://pfam.wustl.edu/hmmsearch.shtml) sequence/structure Bateman et al., (2000) Nucleic Acids Research 28:263-266 Transfer by inheritance: Function annotation transfer from sequence through homology http://www.uniprot.org/ PDB The annotation process at UniProt Open problems of “inheritance through homology “ •Not all UniProt files are GO annotated •The optimal threshold value of sequence identity for function transfer is not known •Proteins contain multiple domains •Proteins can share common domains and not necessarily the same function •In proteins different combination of shared domains lead to different biological roles