Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Locating Gene/Protein Information January 11, 2011 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/molbio Objectives Generate a gene list Mine gene/protein information http://www.hsls.pitt.edu/molbio Topics Literature Informatics Gene / Protein Information Gateways Search Engine for MolBio / Bioinformatics Databases and Software http://www.hsls.pitt.edu/molbio Topics Literature Informatics Comprehensive search: MESH term based PubMed Search PubMed Special topics query Next-generation literature search tools: Gopubmed GLAD4U HugeNavigator http://www.hsls.pitt.edu/guides/genetics http://www.hsls.pitt.edu/guides/genetics Literature Informatics Which genes/proteins are reported to be associated with the disease - Schizophrenia? Citations: 19 million Journals: 5200 Schizophrenia: 86,384.. 96234 Schizophrenia gene: 5851…7295 http://www.hsls.pitt.edu/guides/genetics Challenges in Literature Search Am I getting everything? Too much Information.. How to digest? A list with citations http://www.hsls.pitt.edu/guides/genetics Medical Subject Heading (MESH) http://www.hsls.pitt.edu/guides/genetics Medical Subject Headings (MeSH) The U.S. National Library of Medicine's controlled vocabulary (thesaurus) Arranged in a hierarchical manner called the MeSH Tree Structures Updated annually http://www.hsls.pitt.edu/guides/genetics MeSH Vocabulary Headings over 24,000 representing concepts found in the biomedical literature (Body Weight, Kidney, Radioactive Waste) Subheadings attached to headings to describe a specific aspect of a concept (adverse effects , metabolism, diagnosis, therapy) Supplementary Concept Records over 172,000 terms in a separate chemical thesaurus -updated weekly (cordycepin , valspodar , tacrolimus binding protein 4) Publication Types (Letter, Review, Randomized Controlled Trial) http://www.hsls.pitt.edu/guides/genetics MeSH Tree Structure A. Anatomy B. Organisms C. Diseases D. Chemical and Drugs E. Analytical, Diagnostic and Therapeutic Techniques and Equipment F. Psychiatry and Psychology G. Biological Sciences H. Physical Sciences I. Anthropology, Education, Sociology and Social Phenomena J. Technology and Food and Beverages K. Humanities L. Information Science M. Persons N. Health Care V. Publication Characteristics Z. Geographic Locations http://www.hsls.pitt.edu/guides/genetics MeSH Indexing Source: NLM http://www.hsls.pitt.edu/guides/genetics MeSH Indexing Genes/Chemicals MeSH Terms http://www.hsls.pitt.edu/guides/genetics PubMed Query Using MeSH http://www.ncbi.nlm.nih.gov/mesh http://www.hsls.pitt.edu/guides/genetics Find articles on “Dengue outbreaks in India” by searching PubMed using Mesh terms Resources •Mesh Browser : http://www.ncbi.nlm.nih.gov/mesh •PubMed: http://www.ncbi.nlm.nih.gov/pubmed Link to the video tutorial: http://media.hsls.pitt.edu/media/molbiovideos/pubmedsearch1.swf http://www.hsls.pitt.edu/molbio Building PubMed Queries Term Boolean Term Boolean Dengue AND Outbreaks 823 Dengue * AND Outbreaks 746 Dengue AND Outbreaks AND India 131 Dengue* AND Outbreaks AND India 116 Dengue AND Outbreaks/ AND statistics and numerical data India 7 Dengue* AND Outbreaks/ statistics and numerical data India 7 AND http://www.hsls.pitt.edu/guides/genetics Term # papers Useful links for MESH MESH Browser: http://www.ncbi.nlm.nih.gov/mesh Link to Wikipedia, Youtube videos, blogs etc on “medical subject heading”: 18 ways to improve your Pubmed searches by Carrie Iwema http://www.kosmix.com/topic/Medical_Subject_Headings? http://bitesizebio.com/2008/03/05/18-ways-to-improve-yourpubmed-searches/ Searching by using the MeSH Database. NCBI Handbook : http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed &part=pubmedhelp#pubmedhelp.Searching_by_using_t http://www.hsls.pitt.edu/guides/genetics Find genes that are reported to be associated with the disease SCHIZOPHRENIA by searching PubMed Resources •PubMed Clinical Queries: http://www.ncbi.nlm.nih.gov/pubmed/clinical Link to the video tutorial: http://media.hsls.pitt.edu/media/molbiovideos/pubmedsearch2.swf http://www.hsls.pitt.edu/molbio Topic-Specific PubMed Queries http://www.nlm.nih.gov/bsd/special_queries.html http://www.hsls.pitt.edu/guides/genetics Research on Optimal Search Strategies http://www.hsls.pitt.edu/guides/genetics PubMed Special Topic Queries http://www.hsls.pitt.edu/guides/genetics Search Filters http://www.hsls.pitt.edu/guides/genetics PubMed Search Filter: Medical Genetics ("schizophrenia"[MeSH Terms] OR "schizophrenia"[All Fields]) AND (("genetics, medical"[MeSH Terms] OR ("genetics"[All Fields] AND "medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND "genetics"[All Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields]) OR "genetics"[Subheading] AND ("genetics"[Subheading] OR "genetics"[All Fields] OR "genetics"[MeSH Terms])) http://www.hsls.pitt.edu/guides/genetics PubMed Search Result Display http://www.hsls.pitt.edu/guides/genetics Latest Innovations in Literature Searching GoPubMed Display search results sorted into meaningful topics and subtopics http://www.hsls.pitt.edu/guides/genetics GoPubMed www.gopubmed.com http://www.hsls.pitt.edu/guides/genetics Find genes that are reported to be associated with the disease SCHIZOPHRENIA by using GoPubMed Resources • GoPubMed: http://www.gopubmed.org/web/gopubmed/2?WEB10O00h00100090000 Link to the video tutorial: http://media.hsls.pitt.edu/media/clres2705/gopubmed.swf http://www.hsls.pitt.edu/molbio GoPubMed Search Result http://www.hsls.pitt.edu/guides/genetics GoPubMed Search Result Analysis http://www.hsls.pitt.edu/guides/genetics GoPubMed Search Result Analysis http://www.hsls.pitt.edu/guides/genetics Latest Innovations in Literature Searching http://www.hsls.pitt.edu/guides/genetics PubMed driven Web Tools http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search/ Literature to Gene List GLAD4U http://bioinfo.vanderbilt.edu/glad4u/ http://www.hsls.pitt.edu/guides/genetics Gene list to common functions http://www.hsls.pitt.edu/guides/genetics Literature to Gene list http://www.quertle.info/ http://www.hsls.pitt.edu/guides/genetics NIH Grant Applications to Gene List http://www.hsls.pitt.edu/guides/genetics Curated Molecular Databases http://www.hsls.pitt.edu/guides/genetics Molecular Databases Nucleic Acids Research : Annual databases Issue NAR: Annual Web Server Issue Oxford Journal : Bioinformatics BioMedCentral: BMC Bioinformatics Growth of bioinformatics tools http://www.hsls.pitt.edu/guides/genetics Growth of Molecular Databases 2011: 1330 2008: 1078 Source: Nodal Point Blog http://www.hsls.pitt.edu/guides/genetics GWA Studies Catalog http://www.hsls.pitt.edu/guides/genetics GWA Studies Catalog http://www.genome.gov/gwastudies/ http://www.hsls.pitt.edu/guides/genetics Search Engine Just for Human Genetics CDC HuGENavigator : http://hugenavigator.net/ http://www.hsls.pitt.edu/guides/genetics Find human genes that are reported to be associated with the Asthma Find human SNPs that are reported to be associated with the Asthma Resources • HugeNavigator: http://hugenavigator.net/HuGENavigator/home.do Link to the video tutorial: http://media.hsls.pitt.edu/media/clres2705/asthma.swf http://www.hsls.pitt.edu/molbio Search Engine Just for Human Genetics http://www.hsls.pitt.edu/guides/genetics Search Engine Just for Human Genetics CDC HuGENavigator : http://hugenavigator.net/ http://www.hsls.pitt.edu/guides/genetics Search Engine Just for Human Genetics http://hugenavigator.net/HuGENavigator/huGEPedia.do http://www.hsls.pitt.edu/guides/genetics Search Engine Just for Human Genetics CDC HuGENavigator : http://hugenavigator.net/ http://www.hsls.pitt.edu/guides/genetics Find Disease Causing SNPs http://hugenavigator.net/HuGENavigator/gWAHitStartPage.do What SNPs are associated with “Schizophrenia”? http://www.hsls.pitt.edu/guides/genetics Hands-On exercise on lit search Which proteins are related to Alzheimer’s disease? Where/who are the leading centers and scientists for liver transplantation? Which hormones are Autistic Disorder associated with? http://www.hsls.pitt.edu/guides/genetics Gene/Protein Information Mining http://www.hsls.pitt.edu/guides/genetics Bioinformatics Databases & Software Providers National Center for Biotechnology Information (NCBI) Home page Site map Resource Guide European Bioinformatics Institute (EBI) Home page Databases Software http://www.hsls.pitt.edu/guides/genetics Gene Information Gateways o Open access resources: National Center for Biotechnology Information (NCBI) Genbank Refseq Entrez Gene Gene Expression Omnibus (GEO) OMIM http://www.hsls.pitt.edu/guides/genetics Protein Information Hubs o Open access resources: European Bioinformatics Institute (EBI) Uniprot Interpro Prosite STRING UCSC Genome Bioinformatics BLAT Search Gene Detail Page http://www.hsls.pitt.edu/guides/genetics Protein Information Hubs o Open access resources: National Center for Biotechnology Information (NCBI) Refseq Entrez Gene Conserved Domain Database (CDD) Molecular Modeling Database (MMDB) 3D structure viewer: Cn3D http://www.hsls.pitt.edu/guides/genetics Gene/Protein Information Chromosomal location, mRNA, genomic seq, orthologs, paralogs, regulatory elements, Amino acid seq, domain architecture, protein structure, post translational modifications Gene expression, biological pathways, protein interaction map, disease association, biomarkers http://www.hsls.pitt.edu/guides/genetics Gene Questions ? What is its genomic seq? How many splice varients are there? What are its intron-exon architechure? What is its function? Which tissues it expressed ? What are its neighboring genes? What diseases are associated with it? http://www.hsls.pitt.edu/guides/genetics How can I get its cDNA clone? NCBI : Entrez Gene Chromosomal Localization Amino acid Genomic mRNA Sequence Sequence Sequence Homologous Sequences Expression Profile Disease 3D Structure SNP http://www.hsls.pitt.edu/guides/genetics Interacting Partners Entrez Gene Find: gene symbols and aliases sequences: genomic, mRNA, protein intron-exon architecture genomic context: neighboring and antisense genes Interacting partners associated gene ontology terms: function, cellular component and biological process http://www.hsls.pitt.edu/guides/genetics Entrez Gene a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map Viewer Statistics Gene: 7974 organisms Genbank: 160,000 organisms each record represents a single gene from a given organism http://www.hsls.pitt.edu/guides/genetics NCBI Sequence Databases GenBank GenPept archival database of nucleotide sequences from >160,000 organisms More info conceptual translation of GenBank CDS Refseq based on GenBank record, non-redundant expert verified databases of reference sequences http://www.hsls.pitt.edu/guides/genetics International Nucleotide Sequence Database Collaboration http://www.hsls.pitt.edu/guides/genetics Primary Vs Derivative databases http://www.hsls.pitt.edu/guides/genetics RefSeq Scope & Accessions Genomic DNA NC_123456 - complete genome, complete chromosome, complete plasmid NG_123456 - genomic region NT_123456 - genomic contig mRNA - NM_123456 Protein - NP_123456 more about RefSeq scope and accessions... http://www.hsls.pitt.edu/guides/genetics RefSeq Status Codes Provisional Reviewed Predicted Genome Annotation more about RefSeq status codes http://www.hsls.pitt.edu/guides/genetics Hands on Find mRNA sequence for your gene of interest (p53, BRCA1, EGFR, PLCg1) Start page: Entrez core nucleotide Use Limits, History and Preview Index http://www.hsls.pitt.edu/guides/genetics Sequence Format GenBank Header Features Sequence FASTA Sequence Example: U49845 Sample GenBank record Sequence Revision History tool http://www.hsls.pitt.edu/guides/genetics Video Tutorials http://www.hsls.pitt.edu/molbio/videos?c=3 http://www.hsls.pitt.edu/guides/genetics Find mRNA Sequence for Reelin Gene. http://www.hsls.pitt.edu/guides/genetics Gene Function What is its function? Entrez Gene Page: Summary (TOC) Gene Ontology GeneRIFs Pathways (TOC) Biosystems (Links) http://www.hsls.pitt.edu/guides/genetics Gene Ontology (GO) Controlled vocabulary tagging • Function Biological Processes Cellular Component • • http://www.hsls.pitt.edu/guides/genetics Gene Ontology (GO) and KEGG GO information page GO evidence codes KEGG Information page http://www.hsls.pitt.edu/guides/genetics Function How many splice variants are there? What is/are its sequence? Entrez Gene Page: Genomic regions… (TOC) UCSC (Links) Video Tutorials http://www.hsls.pitt.edu/guides/genetics Alternative Splicing http://www.hsls.pitt.edu/guides/genetics Intron-Exon Coordinates What are its intron-exon architechure? Entrez Gene Page: Display Change it from Full report to Gene Table Video Tutorials http://www.hsls.pitt.edu/guides/genetics Neighboring Genes What are its neighboring genes? Entrez Gene Page: Genomic context (TOC) Video Tutorials http://www.hsls.pitt.edu/guides/genetics Chromosomal location http://www.hsls.pitt.edu/guides/genetics Associated Diseases What diseases are associated with it? Entrez Gene Page:TOC •General Information_Phe notype Links OMIM HuGE Navigator Video Tutorials http://www.hsls.pitt.edu/guides/genetics Homologene What are its homologous genes? Entrez Gene Page: Link Homologene change Display settings Video Tutorials http://www.hsls.pitt.edu/guides/genetics Reagents How can I get its cDNA clone? ..antibodies? .. siRNA ? Entrez Gene Page: TOC: Additional Links Research Materias Exact Antigen Video Tutorials http://www.hsls.pitt.edu/guides/genetics Protein Information Gateways http://www.hsls.pitt.edu/guides/genetics UniprotKB : Universal Protein Resource : a comprehensive, centralized protein information resource Developed by a consortium: European Bioinformatics Institute (EBI) the Swiss Institute of Bioinformatics (SIB) the Protein Information Resource (PIR) Comprised of: --Swiss-Prot: biologist-curated annotation data --TrEMBL: computationally annotation data --PIR-International Protein Sequence Database (PIR-PSD): the most comprehensive and expertly-curated protein sequence database in the public domain for over 20 years. Funded by: NIH, NSF, the European Union and the Swiss Federal government Link to Wiki, YouTube, Blogs and Tweets: http://www.kosmix.com/topic/uniprot? Tutorial Video: http://www.youtube.com/watch?v=TCF3qWn7siI&feature=youtube_gdata http://www.hsls.pitt.edu/guides/genetics Protein Questions ? What is its Function? Amino acid sequence? … molecular wt? isoelectric point (PI)? …post translational modifications? … presence of domain/pattern/profile? … hydrophobicity? … homologous orthologs? Etc. Structure? … secondary and tertiary? Interaction Partner? http://www.hsls.pitt.edu/guides/genetics Uniprot Video Tutorial http://www.hsls.pitt.edu/molbio/videos/play?v=19 http://www.hsls.pitt.edu/guides/genetics Protein Function from UniprotKB Uniprot Search: Look under: general annotation_Function, ontologies_keywords, geneontology http://www.hsls.pitt.edu/guides/genetics Protein Sequence Uniprot • Sequence annotations • sequences Gene • Genomic regions, transcripts, and products • ccds (consensus cds report) UCSC • Sequence and links http://www.hsls.pitt.edu/guides/genetics Protein Sequence Analysis PTM • Uniprot • Seq annt • IPA • Modifications and Regulation PI/MW Hydrophobicity • Uniprot • Uniprot • Seq_Tool • Compute PI • Seq_Tool • ProtScale Peptide Digest Homolog ous Seq Domain/patt ern • Uniprot • Uniprot • Seq_Tools • PeptideMas s • PeptideCutt er http://www.hsls.pitt.edu/guides/genetics • Entrez Gene • Homologe ne • Sequence annotation • InterPro • Entrez gene • Conserved Domain Protein Domain Resources Protein Domain Databases: InterPro http://www.hsls.pitt.edu/guides/genetics Protein Domains Wikipedia: A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. http://www.hsls.pitt.edu/guides/genetics Protein Domain: SH3 Src homology 3 domains; SH3 domains bind to proline-rich ligands with moderate affinity and selectivity, preferentially to PxxP motifs; they play a role in the regulation of enzymes by intramolecular interactions, changing the subcellular localization of signal pathway components and mediate multiprotein complex assemblies. http://www.hsls.pitt.edu/guides/genetics Protein Structure Primary Secondary Tertiary Quarternary Taken from wikipedia Useful links: http://www.kosmix.com/topic/protein_structure? http://www.hsls.pitt.edu/guides/genetics Protein Structure NCBI http://www.hsls.pitt.edu/guides/genetics Finding Protein Structure http://www.hsls.pitt.edu/guides/genetics Structure Databases and Viewer Databases: RCSB Protein Data Bank (PDB) MMDB State University of New Jersey (Rutgers), the San Diego Supercomputer Center at the University of California San Diego, the University of Wisconsin-Madison Link http://www.kosmix.com/topic/protein_data_bank? NCBI's structure database is called MMDB (Molecular Modeling DataBase), and it is a subset of three-dimensional structures obtained from the Protein Data Bank (PDB), excluding theoretical models.. Viewer: Cn3D : a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service. Rasmol: EBI First glance in j mol : A simple tool for macromolecular visualization. (More..) http://www.hsls.pitt.edu/guides/genetics Protein Structure Search for the 3D structure of P53 Entrez structure View the crystal structure of mouse p53 core domain (MMDB: 42987) or Crystal Structure Of A P53 Core Dimer Bound To Dna ( PDB:2GEQ) http://www.hsls.pitt.edu/guides/genetics Manipulating the Structure Viewer Window Find Similar Structure: NCBI VAST http://www.hsls.pitt.edu/guides/genetics NCBI BLink BLink ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain. To access it, follow the BLink link displayed beside any hit in the results of an Entrez Proteins search. http://www.hsls.pitt.edu/guides/genetics Hands-on Protein Structure View the crystal structure of Chronophin (PDB entry: 2P69). A variant of this protein with mutations in its amino acid sequence has been isolated. Can you predict any effect of its mutations into its function? Hint: Find the amino acid residues which are in close contact (3.5 A) with PYRIDOXAL-5'-PHOSPHATE (PLP). Label the amino acids and save the picture in PNG format. Learn more on Chronophin structure at: http://kb-dev.psi-structuralgenomics.org/KB/archives.jsp?pageshow=3 http://www.hsls.pitt.edu/guides/genetics Hands-on Protein Structure of Chronophin http://kb-dev.psi-structuralgenomics.org/KB/archives.jsp?pageshow=3 http://www.hsls.pitt.edu/guides/genetics Sequence Alignment in Cn3D NCBI http://www.hsls.pitt.edu/guides/genetics Hands-On Can you identify the human protein which contains a short peptide sequence: GPDGMPVIYHGHTLTTKIKFSDVLHTIKE ? What is its function? What is its calculated PI and molecular wt? Which region of this protein is most hydrophobic? Locate five experimentally verified S/T/Y phosphorylation sites present in this protein. Find the homologous mouse and fruit fly orthologs of this human protein and report the % protein identity it shares with these orthologs. How many protein domains are reported to be present in this human protein? Find the location of its largest domain. http://www.hsls.pitt.edu/guides/genetics Licensed Tools for Gene/Protein Information http://www.hsls.pitt.edu/guides/genetics HSLS Licensed Tools BioBase Metacore Ingenuity IPA http://www.hsls.pitt.edu/guides/genetics Gene/Protein facts from Biobase http://goo.gl/9wpwG http://www.hsls.pitt.edu/guides/genetics BioBase BioKnowledge Library http://www.hsls.pitt.edu/guides/genetics Protein Function from IPA http://www.hsls.pitt.edu/guides/genetics Search Engine for Bioinformatics Tools http://www.hsls.pitt.edu/guides/genetics Biomedical and Life Sciences Search Engines OBRC : University of Pittsburgh http://www.hsls.pitt.edu/guides/genetics/obrc Vadlo http://vadlo.com/ OReFil : University of Tokyo http://orefil.dbcls.jp/ http://www.hsls.pitt.edu/guides/genetics Search.HSLS.MolBio http://www.hsls.pitt.edu/molbio http://www.hsls.pitt.edu/guides/genetics Search.HSLS.MolBio Integrated search system Databases & Software Articles on Databases & Software Genes/Proteins Pathways Protocols Seminar/Talks Videos Recommended Articles Tabbed browsing Clustered search results http://www.hsls.pitt.edu/guides/genetics Search term: “phosphorylation” http://www.hsls.pitt.edu/guides/genetics Molecular Databases and Software: search term: “Phosphorylation” http://www.hsls.pitt.edu/guides/genetics Search Result Page http://www.hsls.pitt.edu/guides/genetics Citation Trackers http://www.hsls.pitt.edu/guides/genetics Searh PubMed Articles on Databases and Software : “phosphorylation” http://www.hsls.pitt.edu/guides/genetics Articles on Databases and Software http://www.hsls.pitt.edu/guides/genetics Articles on Prediction of Phosphorylation Sites http://www.hsls.pitt.edu/guides/genetics Prediction of Phosphorylation Sites http://www.hsls.pitt.edu/guides/genetics MetaPredPS http://www.hsls.pitt.edu/guides/genetics Clustering Remix http://www.hsls.pitt.edu/guides/genetics Genes/Proteins Info http://www.hsls.pitt.edu/guides/genetics Entrez Gene http://www.hsls.pitt.edu/guides/genetics BioBase Knowledge Library http://www.hsls.pitt.edu/guides/genetics Protocols: http://www.hsls.pitt.edu/guides/genetics Seminar Talks Video http://www.hsls.pitt.edu/guides/genetics Seminar Talks Video http://www.hsls.pitt.edu/guides/genetics Recommended Articles Faculty of 1000 Biology: a literature awareness tool that highlights and reviews the most interesting papers published in the biological sciences, based on the recommendations of a faculty of well over 2300 selected leading researchers. http://www.hsls.pitt.edu/guides/genetics Faculty of 1000 http://www.hsls.pitt.edu/guides/genetics Recommended Articles http://www.hsls.pitt.edu/guides/genetics Recommended Articles http://www.hsls.pitt.edu/guides/genetics Thank you! Any questions? Carrie Iwema [email protected] 412-383-6887 Ansuman Chattopadhyay [email protected] 412-648-1297 http://www.hsls.pitt.edu/molbio