* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Bioinformatics and Functional Genomics, Chapter 8, Part 1
Artificial gene synthesis wikipedia , lookup
Paracrine signalling wikipedia , lookup
Signal transduction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Metalloprotein wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
Homology modeling wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Interactome wikipedia , lookup
Protein structure prediction wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein analysis and proteomics (Part 1 of 2) Monday, September 29, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner [email protected] Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by John Wiley & Sons, Inc. These images and materials may not be used without permission from the publisher. We welcome instructors to use these powerpoints for educational purposes, but please acknowledge the source. The book has a homepage at http://www.bioinfbook.org Including hyperlinks to the book chapters. Outline for the course Today: protein analysis and proteomics Wednesday October 1: protein structure Monday October 6: no class Wednesday Oct. 8: multiple sequence alignment Monday October 13: phylogeny Wednesday October 15: phylogeny (continued) Then we begin studying the Tree of Life (final part of course) Outline for today Protein analysis and proteomics Individual proteins Protein families Physical properties Localization Function Large-scale protein analysis 2D protein gels Yeast two-hybrid Rosetta Stone approach Pathways DNA RNA protein Page 224 [1] Protein families protein Page 224 [1] Protein families protein [2] Physical properties Page 224 [1] Protein families [3] Protein localization protein [2] Physical properties Page 224 [1] Protein families [3] Protein localization protein [4] Protein function [2] Physical properties Page 224 [1] Protein families [3] Protein localization protein [4] Protein function Gene ontology (GO): --cellular component --biological process --molecular function [2] Physical properties Page 224 Perspective 1: Protein domains and motifs Page 225 Definitions Signature: • a protein category such as a domain or motif Page 225 Definitions Signature: • a protein category such as a domain or motif Domain: • a region of a protein that can adopt a 3D structure • a fold • a family is a group of proteins that share a domain • examples: zinc finger domain immunoglobulin domain Motif (or fingerprint): • a short, conserved region of a protein • typically 10 to 20 contiguous amino acid residues Page 225 15 most common domains (human) Zn finger, C2H2 type Immunoglobulin EGF-like Zn-finger, RING Homeobox Pleckstrin-like RNA-binding region RNP-1 SH3 Calcium-binding EF-hand Fibronectin, type III PDZ/DHR/GLGF Small GTP-binding protein BTB/POZ bHLH Cadherin 1093 proteins 1032 471 458 417 405 400 394 392 300 280 261 236 226 226 Page 227 15 most common domains (various species) The European Bioinformatics Institute (EBI) offers many key proteomics resources: http://www.ebi.ac.uk/proteome/ Page 227 Definition of a domain According to InterPro at EBI (http://www.ebi.ac.uk/interpro/): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities. Page 226 Varieties of protein domains Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times Page 228 Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2) MBD TRD The protein includes a methylated DNA binding domain (MBD) and a transcriptional repression domain (TRD). MeCP2 is a transcriptional repressor. Mutations in the gene encoding MeCP2 cause Rett Syndrome, a neurological disorder affecting girls primarily. Page 227 Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins Page 228 Are proteins that share only a domain homologous? Page 228 Example of a multidomain protein: HIV-1 pol • 1003 amino acids long • cleaved into three proteins with distinct activities: -- aspartyl protease -- reverse transcriptase -- integrase We will explore HIV-1 pol and other proteins at the Expert Protein Analysis System (ExPASy) server. Visit www.expasy.org/ Page 229 Page 230 SwissProt entry for HIV-1 pol links to many databases Page 230 ProDom entry for HIV-1 pol shows many related proteins Page 231 Proteins can have both domains and patterns (motifs) Pattern Pattern (several (several residues) residues) Domain (aspartyl protease) Domain (reverse transcriptase) Page 231 Page 232 Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. We will encounter profiles in Pfam, ProDom, SMART, and other databases. Page 231-233 Perspective 2: Physical properties of proteins Page 233 Page 234 Physical properties of proteins Many websites are available for the analysis of individual proteins. ExPASy and ISREC are two excellent resources. The accuracy of these programs is variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as posttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms. Page 236 Page 235 Page 235 Page 235 Page 236 Page 238 Page 238 Page 238 Syntaxin, SNAP-25 and VAMP are three proteins that interact via coiled-coil domains Introduction to Perspectives 3 and 4: Gene Ontology (GO) Consortium Page 237 The Gene Ontology Consortium An ontology is a description of concepts. The GO Consortium compiles a dynamic, controlled vocabulary of terms related to gene products. There are three organizing principles: Molecular function Biological process Cellular compartment You can visit GO at http://www.geneontology.org. There is no centralized GO database. Instead, curators of organism-specific databases assign GO terms to gene products for each organism. Page 237 GO terms are assigned to LocusLink entries Page 241 Page 241 Page 241 Page 241 The Gene Ontology Consortium: Evidence Codes IC IDA IEA IEP IGI IMP IPI ISS NAS ND TAS Inferred by curator Inferred from direct assay Inferred from electronic annotation Inferred from expression pattern Inferred from genetic interaction Inferred from mutant phenotype Inferred from physical interaction Inferred from sequence or structural similarity Non-traceable author statement No biological data Traceable author statement Page 240 Perspective 3: Protein localization Page 242 Protein localization protein Page 242 Protein localization Proteins may be localized to intracellular compartments, cytosol, the plasma membrane, or they may be secreted. Many proteins shuttle between multiple compartments. A variety of algorithms predict localization, but this is essentially a cell biological question. Page 240 Page 242 Page 244 Page 244 Localization of 2,900 yeast proteins Michael Snyder and colleagues incorporated epitope tags into thousands of S. cerevisiae cDNAs, and systematically localized proteins (Kumar et al., 2002). See http://ygac.med.yale.edu for a database including 2,900 fluorescence micrographs. Page 243 Perspective 4: Protein function Page 243 Protein function Function refers to the role of a protein in the cell. We can consider protein function from a variety of perspectives. Page 243 1. Biochemical function (molecular function) RBP binds retinol, could be a carrier Page 245 2. Functional assignment based on homology RBP could be a carrier too Other carrier proteins Page 245 3. Function based on structure RBP forms a calyx Page 245 4. Function based on ligand binding specificity RBP binds vitamin A Page 245 5. Function based on cellular process DNA RNA RBP is abundant, soluble, secreted Page 245 6. Function based on biological process RBP is essential for vision Page 245 7. Function based on “proteomics” or high throughput “functional genomics” High throughput analyses show... RBP levels elevated in renal failure RBP levels decreased in liver disease Page 245 Functional assignment of enzymes: the EC (Enzyme Commission) system Oxidoreductases Transferases Hydrolases Lyases Isomerases Ligases 1,003 1,076 1,125 356 156 126 Page 246 Functional assignment of proteins: Clusters of Orthologous Groups (COGs) Information storage and processing Cellular processes Metabolism Poorly characterized Page 247 Functional assignment of proteins: Clusters of Orthologous Groups (COGs) Information storage and processing Cellular processes Metabolism Poorly characterized (Most useful for prokaryotes; we will describe COGs on Oct. 20) Page 247 This lecture continues in part 2 with a discussion of two dimensional gels and the yeast two-hybrid system http://pevsnerlab.kennedykrieger.org/ppts/lecture_bioinf_ch8_part2.ppt