* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download biological process
Structural alignment wikipedia , lookup
Rosetta@home wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein design wikipedia , lookup
List of types of proteins wikipedia , lookup
Homology modeling wikipedia , lookup
Protein folding wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Trimeric autotransporter adhesin wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Western blot wikipedia , lookup
Protein Structure & Analysis Biology 224 Dr. Tom Peavy Sept 28 & 30 <Images from Bioinformatics and Functional Genomics by Jonathan Pevsner> Protein families Protein localization protein Protein function Gene ontology (GO): --cellular component --biological process --molecular function Physical properties The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) Work groups • Gel Electrophoresis • Mass Spectrometry • Molecular Interactions • Protein Modifications • Proteomics Informatics • Sample Processing Themes • Controlled vocabularies • MIAPE: Minimum information about a proteomics experiment The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) http://www.psidev.info/ Protein domains, motifs & signatures Definitions Signature: • a protein category such as a domain or motif (a defining property of the protein or family) Domain: • a region of a protein that can adopt a 3D structure • a fold • a family is a group of proteins that share a domain • examples: zinc finger domain immunoglobulin domain Motif (or fingerprint): • a short, conserved region of a protein • typically 10 to 20 contiguous amino acid residues Definition of a domain According to InterPro at EBI (http://www.ebi.ac.uk/interpro/): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities. 15 most common domains (human) Zn finger, C2H2 type Immunoglobulin EGF-like Zn-finger, RING Homeobox Pleckstrin-like RNA-binding region RNP-1 SH3 Calcium-binding EF-hand Fibronectin, type III PDZ/DHR/GLGF Small GTP-binding protein BTB/POZ bHLH Cadherin 1093 proteins 1032 471 458 417 405 400 394 392 300 280 261 236 226 226 Varieties of protein domains Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2) MBD TRD The protein includes a methylated DNA binding domain (MBD) and a transcriptional repression domain (TRD). MeCP2 is a transcriptional repressor. Mutations in the gene encoding MeCP2 cause Rett Syndrome, a neurological disorder affecting girls primarily. Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins Are proteins that share only a domain homologous? Proteins can have both domains and patterns (motifs) Pattern Pattern (several (several residues) residues) Domain (aspartyl protease) Domain (reverse transcriptase) The SwissProt entry for any protein provides highly useful information… SwissProt entry for HIV-1 pol links to many databases Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. Profiles are found in Pfam, ProDom, SMART, and other databases. Page 231-233 http://www.ebi.ac.uk/Databases/ InterPro InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. http://www.ebi.ac.uk/interpro/ ExPASy Proteomics Server The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE (Disclaimer / References). http://ca.expasy.org/ PROSITE Database of protein families and domains http://ca.expasy.org/prosite/ Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. http://www.sanger.ac.uk/Software/Pfam/index.shtml PRINTS is a compendium of protein fingerprints http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/ The ProDom protein domain database consists of an automatic compilation of homologous domains. http://prodes.toulouse.inra.fr/prodom/current/html/home.php ProDom entry for HIV-1 pol shows many related proteins Page 231 SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. http://smart.embl-heidelberg.de/ Houses the PIRSF, ProClass and ProLINK databases http://pir.georgetown.edu/ www.uniprot.org Three protein databases recently merged to form UniProt: • SwissProt • TrEMBL (translated European Molecular Biology Lab) • Protein Information Resource (PIR) You can search for information on your favorite protein there; a BLAST server is provided. 1. Go to ExPASy (http://www.expasy.ch/) 2. If you know the SwissProt accession of your protein, enter it at top. 3. Otherwise go into Swiss-Prot/TrEMBL, click SRS (Sequence Retrieval System), click Start, then click continue, then search for your protein of interest. Page 230 Protein family classification and databases TIGRFAMs http://www.tigr.org/TIGRFAMs/index.shtml SUPERFAMILY http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ PANTHER http://www.pantherdb.org/ PIRSF http://pir.georgetown.edu/iproclass/ Gene3D http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/ Physical properties of proteins Many websites are available for the analysis of individual proteins. ExPASy and ISREC are two excellent resources. The accuracy of these programs is variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as posttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms. Page 236 http://www.expasy.ch/ Page 230 Access a variety of protein analysis programs from the top right of the ExPASy home page Page 235 Page 244 Page 244 Proteomics: High throughput protein analysis Proteomics is the study of the entire collection of proteins encoded by a genome “Proteomics” refers to all the proteins in a cell and/or all the proteins in an organism Large-scale protein analysis 2D protein gels Yeast two-hybrid Rosetta Stone approach Pathways Page 247 Two-dimensional protein gels First dimension: isoelectric focusing Second dimension: SDS-PAGE Page 248 Two-dimensional protein gels First dimension: isoelectric focusing Electrophorese ampholytes to establish a pH gradient Can use a pre-made strip Proteins migrate to their isoelectric point (pI) then stop (net charge is zero) Range of pI typically 4-9 (5-8 most common) Page 248 Two-dimensional protein gels Second dimension: SDS-PAGE Electrophorese proteins through an acrylamide matrix Proteins are charged and migrate through an electric field Conditions are denaturing (SDS) and reducing (2-mercaptoethanol) Can resolve hundreds to thousands of proteins Page 248 Proteins identified on 2D gels (IEF/SDS-PAGE) Direct protein microsequencing by Edman degradations -- done at many core facilities (e.g. UC Davis) -- typically need 5 picomoles -- often get 10 to 20 amino acids sequenced Protein mass analysis by MALDI-TOF -- done at core facilities -- often detect posttranslational modifications -- matrix assisted laser desorption/ionization time-of-flight spectroscopy Page 250-1 Page 252 Evaluation of 2D gels (IEF/SDS-PAGE) Advantages: Visualize hundreds to thousands of proteins Improved identification of protein spots Disadvantages: Limited number of samples can be processed Mostly abundant proteins visualized Technically difficult Page 251 Gene Ontology (GO) Consortium The Gene Ontology Consortium An ontology is a description of concepts. The GO Consortium compiles a dynamic, controlled vocabulary of terms related to gene products. There are three organizing principles: Molecular function Biological process Cellular component GO terms are assigned to Entrez Gene entries Page 241 Page 241 Example Gene product cytochrome c GO entry terms: molecular function = electron transporter activity, the biological process = oxidative phosphorylation and induction of cell death the cellular component = mitochondrial matrix and mitochondrial inner membrane. GO consortium (http://www.geneontology.org) No centralized GO database. Instead, curators of organism-specific databases assign GO terms to gene products for each organism. AmiGO is the searchable portion of the GO --Gene Symbol, name, UniProt access numbers, and Text searches can be used to find GO entries The Gene Ontology Consortium: Evidence Codes IC IDA IEA IEP IGI IMP IPI ISS NAS ND TAS Inferred by curator Inferred from direct assay Inferred from electronic annotation Inferred from expression pattern Inferred from genetic interaction Inferred from mutant phenotype Inferred from physical interaction Inferred from sequence or structural similarity Non-traceable author statement No biological data Traceable author statement