* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Selecting conditions and phenotpes
Epigenetics of diabetes Type 2 wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Copy-number variation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Protein moonlighting wikipedia , lookup
Human genetic variation wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genome evolution wikipedia , lookup
Point mutation wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene expression programming wikipedia , lookup
Helitron (biology) wikipedia , lookup
Medical genetics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene therapy wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome (book) wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Public health genomics wikipedia , lookup
Microevolution wikipedia , lookup
PRO AND MEDICAL GENETICS RESOURCES AT NCBI DONNA MAGLOTT, PH.D. OPPORTUNITIES The medical genetics group is a relatively recent addition to the suite of resources at NCBI, and manages the NIH Genetic Testing Registry (GTR), ClinVar, and MedGen. These databases share the need to standardize representation of genes, proteins, small molecules, variation, conditions, and phenotypes, not only with respect to explicit terms, but also the relationships among those terms. This presentation will focus on opportunities for utilization of PRO in the NCBI’s Medical Genetics group. CASE STUDIES MEDICAL GENETICS: CLINVAR, GENE, GTR, MEDGEN A QUICK TOUR From the home page… USING THE RESOURCE SECTIONS TRY ALL SECTIONS TRY ALL SECTIONS MAJOR DOMAINS OF INFORMATION Concept NCBI database/Resource Used in Diseases and their defining features MedGen (Diseases, Findings…) ClinVar, dbVar, Gene, GTR, PheGenI, dbGaP Drugs MedGen (Pharmacologic Substance) ClinVar, GTR Genes and gene products Gene, Nucleotide, ClinVar, dbSNP, dbVar, Protein, HomoloGene, GTR … RefSeq Records connected by reciprocal, generic links via database identifiers Biological processes, --Gene cellular components, molecular functions Interactions and pathways Biosystems, Gene Biosystems, Gene Variation ClinVar, dbSNP, dbVar ClinVar, dbSNP, dbVar… SOME TALKING POINTS • • • • • • • • Except for RefSeq, curation minimal RefSeq-based with pointers to UniProtKB Use ontologies to acquire and represent standard terms Point to ontologies, but not used to support node-based query interfaces Capturing primary data that can be used to drive development of ontologies Some user communities think in terms of nucleotide only Data being submitted with uncertain significance Look for opportunities for adding value to NCBI’s databases and tools GENE AND DATA STANDARDS • Name of the gene (nomenclature committees) • Names of protein products • Primary product (Swiss-Prot) • Isoforms (RefSeq) • Names of associated conditions (multiple) • Descriptions of pathways (submitters) • Biological processes, cellular components and molecular functions (GO) • HIV interactions (NIAID) • http://www.ncbi.nlm.nih.gov/gene?term=hiv1interactions[Properties] • http://www.ncbi.nlm.nih.gov/projects/RefSeq/HIVInteractions/ HUMAN MISMATCH REPAIR RESTRICT TO THOSE REPORTED TO BE DISEASE-CAUSING www.ncbi.nlm.nih.gov/gene/4292 Phrase found in: Summary Bibliography Interactions Pathways Gene Ontology General protein information Reference sequences Locus-specific databases Titles of pathways Descriptions of interactions GENE<->PROTEIN HOMOLOGENE DISEASES AND PHENOTYPES MEDGEN: UMLS, HPO, OMIM, ORDO, GTR WHY MEDGEN? • A stable node of identifiers within NCBI for disease names, their clinical features, and pharmacological substances • Built on the foundation of a subset of UMLS, with supplements from HPO, OMIM (between UMLS releases), and submissions to GTR and ClinVar • Primarily automated, but some overview by M.D.s and genetic counselors on staff, and feedback from the community TERMS FROM UMLS/OMIM/GTR/CLINVAR HIERARCHIES: CURATED BY GTR STAFF Guided by OMIM’s clinical series and user feedback HIERARCHIES: COMPUTED FROM NODES IN UMLS Hierarchy from DNA Repair Deficiency Disorders USING HPO FOR CLINICAL FEATURES • Partial display • Organized by top nodes of the ontology • Each specific term supports a link to disorders manifesting that feature CLINVAR: REPORTED VARIATIONPHENOTYPE RELATIONSHIP CLINVAR: REPORTED VARIATIONPHENOTYPE RELATIONSHIP Submitter archive (not curated) • Variant • Disease and/or phenotypes • Interpretation • Confidence SUBSET OF A DETAILED RECORD • Gene name and symbol • Sequence ontology for molecular and functional consequences • Diseases • Identifiers and links • Observed phenotypes (as distinct from those reported to be characteristic of the diagnostic term) • Protein change from the variant DATA SOURCES AND GROWTH SUBMISSIONS FROM UNIPROT Summarize submissions by genes, diseases, and phenotypes CURRENT STATUS: CLINGEN-RELATED Diseases Genes Variants Predictions • Conserved sequence • Conserved domains • Pathways http://www.clinicalgenome.org/ ‘PHENOTYPE’ AND CLINGEN/CLINVAR • Working group on phenotype • Make distinctions among • Disease category (body system, metabolic perturbation, cancer) • Diagnosis • Characteristic features • General or gene-specific • Diseases targeted by drugs for which the response is genetically determined • Observed phenotypes • HPO • PhenoDB • Indications for testing • Standardization • One ontology or many? • Relationship to OMIM VARIATION AND CLINGEN/CLINVAR • Sequence Ontology for variant location and effect • Coordinate with PharmGKB for pharmacogenomics • Description of haplotypes • No discussion yet about authorities for pathways, conserved domains, post-translational modifications CURRENT STATUS: NCBI • Working with UMLS to improve representation of terms and relationships • Mapping concepts • Reporting relationships • Supplement current UMLS with HPO, Orphanet (ORDO, in progress), and recent data from OMIM • Working with Clinical Pharmacogenetics Implementation Consortium (CPIC) and PharmGKB • Representation of haplotypes/star alleles • Drug responses/Disease target • Consumer of ontologies to standardize terminology, with definitions • Link to resource site • Provide attribution • Support term-specific queries CURRENT STATUS: NCBI • Queries currently term by term, not by node • Some relationships based on links in Entrez • Gene <->disease • Disease <->clinical feature • Variation <-> gene • Some relationships explicit • Genome->transcript->protein • Nucleotide change->protein change • Some relationships reported as hierarchies • GTR • MedGen (MeSH) • ORDO (in progress) CURRENT STATUS: NCBI • Maintenance • primarily automatic • Some curatorial review by staff of ClinVar and NIH Genetic Testing Registry (GTR) • Expect expanded review from the ClinGen group • Data freely available by ftp or E-utilities • • • • ftp://ftp.ncbi.nih.gov/pub/clinvar/ ftp://ftp.ncbi.nih.gov/gene/ ftp://ftp.ncbi.nih.gov/pub/GTR/ ftp://ftp.ncbi.nih.gov/pub/medgen/ ACKNOWLEDGEMENTS Slava Gorelenkov MedGen Melissa Landrum ClinVar Jennifer Lee GTR, ClinVar Terence Murphy Gene Lon Phan dbSNP/dbVar Kim Pruitt RefSeq Wendy Rubinstein GTR, MedGen Ming Ward dbSNP and all their staff