Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Opinion TRENDS in Genetics Vol.21 No.6 June 2005 DNA, diseases and databases: disastrously deficient George P. Patrinos1 and Anthony J. Brookes2 1 Erasmus University Medical Center, Faculty of Medicine and Health Sciences, MGC-Department of Cell Biology and Genetics, PO Box 1738, 3000 DR, Rotterdam, The Netherlands 2 Department of Genetics, University of Leicester, University Road, Leicester, UK LE1 7RH Recent progress in disease genetics and genome-related medicine has been substantial, with vast amounts of data being generated. However, this progress has not been matched by adequate database projects that gather and organize these data to enable their useful exploitation. This research area is complex, entailing core databases, locus-specific databases, national mutation databases, genotype–phenotype databases and patient databases – and much work is required to develop and properly integrate these various resources. To promote this, we present a timely overview of the field, emphasize its over-riding importance and discuss the disastrously deficient progress made so far. Many factors contribute to this slow progress (e.g. technological hurdles, publication requirements, the short-sighted and popularist research system). A lack of targeted funding is arguably the most fundamental problem, but one that can be solved. Introduction Research into the genetic basis of disease has advanced in scale and sophistication, leading to increased rates of data production in many laboratories. Additionally, DNA diagnostics and electronic healthcare records are increasingly common features of medical practice. Therefore, it should be possible to integrate all of this information to establish a detailed understanding of how genetic differences impact human health. Nevertheless, current progress towards this goal is slow. This is primarily due to the many challenges involved in computationally handling (gathering, exchanging, integrating and interpreting) the relevant primary information. These challenges are often technical in nature, but they also include restricted funding for this kind of research and an intrinsic research bias towards new data acquisition rather than old data management. Valuable discoveries that address the genetic basis of disease are therefore being squandered, and this can only be remedied by a significant enhancement of ‘mutation database’ and related activities. In this article, we highlight some of the main activities relating to mutation databases to: (i) describe the existing and emerging types of database in this domain; (ii) emphasize their potential applications in modern Corresponding author: Patrinos, G.P. ([email protected]). Available online 16 April 2005 medical genetics; and (iii) comment on the key elements that are still missing and holding back the field. Types of mutation databases With great vision, Victor McKusick made the first serious efforts towards summarizing DNA variations and their clinical consequences when he published the Mendelian Inheritance in Man (MIM) – a paper compendium of information on genetic disorders and genes [1]. This is now distributed electronically [Online Mendelian Inheritance in Man (OMIM)] by the National Center for Biotechnology Information (NCBI) and updated on a daily basis (http:// www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbZOMIM& itoolZtoolbar) [2]. Its strength stems from the quality and diversity of its content, but its structure is far from ideal for automated data mining. It is also not comprehensive and could not be expected to be, given the ever-accelerating pace of discovery of genetic mutations involved in human disease and modelorganism phenotypes. Instead, to cope with this large and growing body of information there is a need for a range of diverse and suitably integrated databases. An awareness of this need prompted the formation of the Human Genome Organization Mutation Database Initiative (HUGO-MDI) in the early 1990s [3], which then evolved into the Human Genome Variation Society (HGVS: http://www.hgvs.org). Today, the stated objective of the HGVS is ‘.to foster discovery and characterization of genomic variations, including population distribution and phenotypic associations’. More broadly, the various depositories that fall under the banner of ‘mutation databases’ can be categorized into two types: core (or central) databases and locus-specific databases (LSDBs). Some examples are given in Box 1. The philosophy behind core databases is an attempt to capture all described mutations in all genes, but with each mutation being represented in limited detail. The included phenotype descriptions are generally cursory, making core databases of little value for those wishing to understand the subtleties of phenotypic variability. Core databases tend to include only mutations of large effect that result in mendelian patterns of inheritance, whereas sequence variations not associated with any clinical consequences or those associated with minor or uncertain clinical consequences are rarely catalogued. Thus, core database provide a good overview of patterns of clinically relevant www.sciencedirect.com 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2005.04.004 334 Opinion TRENDS in Genetics Vol.21 No.6 June 2005 Box 1. Examples of useful mutation and related databases † Online Mendelian Inheritance in Man (OMIMe) A comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support research and education in human genomics and the practice of clinical genetics. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, references, general and locus-specific mutation databases (http://www.ncbi. nlm.nih.gov/entrez/query.fcgi?dbZOMIManditoolZtoolbar). † Human Gene Mutation Database (HGMD) A database recording various types of mutation within the coding regions of human nuclear genes causing inherited disease. HGMD does not usually include mutations lacking obvious phenotypic consequences. Data are collected weekly by a combination of manual and computerized search procedures (http://www.hgmd.org). † SNP databases Repositories of single nucleotide polymorphisms (SNPs) and other localized variations across the genome. The most comprehensive SNP database is dbSNP at NCBI (http://www.ncbi.nlm.nih.gov/ projects/SNP/); additional information is available at other sites such as the Human Genome Variation Database (http://hgvbase.cgb. ki.se/) and the International HapMap Project website (http://www. hapmap.org). Some projects have a specific research focus, such as the environmental genome SNP database (egSNP: https://dir-apps. niehs.nih.gov/egsnp/home.htm), which lists common SNPs in selected environmental-response genes. An extensive list of the SNP databases is provided at http://www.genomic.unimelb.edu.au/ mdi/dblist/dblist.html. † PharmGKB A publicly available knowledgebase that consists of a central repository for clinical and genetic information that aids researchers in understanding how genetic variation among individuals contributes to differences in reactions to drugs (http://www.pharmgkb.org/). † PhenomicDB A multi-species genotype–phenotype database, merging public genotype and phenotype data from a wide range of human and other model organisms. Its user interface enables scientists to compare and browse known phenotypes simultaneously for a given gene or a set of genes from different organisms (http://www. phenomicDB.de). mutations and polymorphisms, but almost no fine detail to aid proper understanding. The best current example of a core database is the Human Gene Mutation Database (http://www.hgmd.org) [4], which by March 2005 contained O45 000 different lesions in almost 1 800 different nuclear genes, with new entries accumulating at an average rate of 2 300 per annum. By contrast, LSDBs contain information about one or a few specific genes [5], usually related to a single disease. They are highly curated repositories of published and unpublished mutations within those genes and, as such, they complement the core databases. Data quality and completeness are typically good, with up to 50% of stored records pertaining to unpublished mutations. The data are also rich and informative. For example, LSDBs will typically present each of the multiple discoveries of recurrent mutational events, enabling mutation hotspots to be identified; when these mutations occur on different chromosomal backgrounds (linked to other mutations) such that they result in several, or different, disease www.sciencedirect.com features, these correlations are also recorded. A good example of an LSDB is HbVar (http://globin.cse.psu.edu/ globin/hbvar) [6] – a relational database of hemoglobin variants and thalassemia mutations, providing information on pathology, hematology, clinical presentation and laboratory findings for numerous DNA alterations. Gene and protein variants are annotated with respect to biochemical data, analytical techniques, structure, stability, function, literature references, and qualitative and quantitative distribution in ethnic groups and geographic locations [7]. As is common in LSDBs, entries can be accessed through summary listings or user-generated queries that can be highly specific. For information on O 350 currently available LSDBs see http://www.hgvs.org, http://www.hgmd.org and Ref. [8]. In addition to core databases and LSDBs, DNA variation is also recorded in various polymorphism databases, such as the single nucleotide polymorphism database (dbSNP; http://www.ncbi.nlm.nih.gov/projects/ SNP/) [9], the HAPMAP Data Coordination Center (http://www.hapmap.org/) [10] and the Human Genome Variation Database (HGVbase; http://hgvbase.cgb.ki.se) [11–12]. These resources make no explicit attempt to connect DNA information to phenotypes, and they are not yet perfect in design or content [13–15], but they do make available an extensive list of ‘normal’ variation that occurs in the human genome. These databases are important because they help to complete the picture for any gene or region of interest, by summarizing all of the neutral variants that are typically not included in core databases or LSDBs. Core databases and LSDBs thus share the same primary purpose of representing DNA variations that have a definitive or a probable phenotypic effect. The current databases are, however, too limited in number and in their degree of inter-connection to capture all of the information about pathogenic DNA mutations. The is because the modern research ethos fails to provide adequate incentives (i.e. publication options, peer recognition and funding) to encourage researchers to build new core databases or LSDBs. Initiatives designed to make it technically simple to set up and use such databases are welcomed, such as specialized software [16,17] or interactive user interfaces (e.g. Genewindow; http://genewindow.nci.nih.gov), as are those that directly transfer data from clinical diagnostics laboratories into these depositories [18] (http://dmudb.org/); but these initiatives will not change the fundamental problem. Instead, the biomedical community must first appreciate the overwhelming need for improved mutation database systems to begin to solve this problem. National mutation databases: a new trend The spectrum of mutations observed for any gene or disease will often differ between population groups, and also between distinct ethnic groups within a geographical region. This is an important extra dimension to consider when building mutation depositories, and it is reflected in the emergence of several new national mutation databases (NMDBs) [19]. Not only do NMDBs help to elaborate the demographic history of human population groups, Opinion TRENDS in Genetics Vol.21 No.6 June 2005 they are also a prerequisite to the optimization of national DNA diagnostic services. That is, they will provide essential reference information for use in the design of targeted mutation-detection efforts for clinical use, and they might also serve to enhance awareness among healthcare professionals, bio-scientists, patients and the public about the range of common genetic disorders (and their environmental correlates) suffered by particular population groups. Two of the first online NMDBs are a Finnish database (http://www.findis.org) [20] and an Arabian database (http://www.agddb.org) [21]. Although rich in information, these particular resources unfortunately provide limited query capacity, particularly for allelic frequencies. They will, nevertheless, continue to develop user-friendly designs, offering good querying capacity and extensive expert data curation. The Hellenic and Cypriot NMDBs (available at http://www.goldenhelix.org/hellenic and http://www.goldenhelix.org/cypriot, respectively) are aiming higher, by introducing a specialized database management software (ETHNOS) that enables both compound query formulation and restricted-access data entry so that all records are manually curated to ensure good and consistent data quality [17]. To maximize the utility of NMDBs, the way in which their content is provided needs to ensure a seamless integration with related content in LSDBs and core databases. This is conceptually illustrated in Figure 1. Furthermore, extensive links to other external information (e.g. to OMIM and to various types of genome sequence annotation) would ideally be provided to connect NMDBs to the growing network of genomic databases. 335 and/or with little detail. This situation needs to be improved, and the comprehensive analysis of phenotypes is required – a goal termed ‘phenomics’ [22–24]. Informatics solutions that support phenomics must be developed. Bioinformatics for the phenomics era will have to solve new problems. Although it is relatively easy to create databases for ‘uni-dimensional’ DNA sequence information (comprising merely a four-letter code), devising generic data and database models for the ‘multi-dimensional’ boundless universe of diverse phenotypes remains a challenge. However, no major public database exists that currently presents extensive and sophisticated genotype–phenotype connections. OMIM remains the best contender in this category, although the PhenomicDB is a recent and innovative project, which, by means of orthologous gene relationships, aligns OMIM data with model organism data in a single database (http://www.phenomicDB.de) [25]. Also worthy of note is the PharmGKB project, which focuses on pharmacogenetics (http://www.pharmgkb.org/) [26–27]. So what will genotype–phenotype databases of the future need to achieve? They must aspire to be much more than multi-disease LSDBs or core mutation databases. Because their phenotype data content will be so diverse, computational exploitation of the data will not be possible with anything as simple as sequence similarity searches. Instead, sophisticated and intricate phenotype data models will be required to empower computational analyses, and these solutions will have to make rigorous use of extensive phenotype ontologies. Ultimately genotype–phenotype databases will need to provide the full range of ‘omics’ data (e.g. transcriptomics, proteomics and metabolomics) that mechanistically connect genotype differences to phenotype consequences. Initially however, such an all-encompassing ‘systems biology’ approach might be too daunting, and projects will probably first concentrate on tying DNA changes directly to phenotypes. Achieving even this, given the enormous amount of data now being generated, will depend on the creation of systematic and standardized ways to manage phenotype data, and this by itself will require good international Genotype–phenotype databases: a look at the future Excellent progress has been made in constructing databases of primary DNA information (i.e. genome sequence and polymorphism) and, as described previously, there has been some progress in creating mutation databases to catalog DNA alterations that have a phenotypic effect. However, the phenotype data in these resources are presented in a basic way, for example, in free text entries Population I A B C D E F Population II G H I J K L M N O Central database TRENDS in Genetics Figure 1. Relationships between various types of mutation databases. This depiction is borrowed from electronic commerce, and it uses the concept of an ancient temple to illustrate the fruitful synthesis of core, LSDBs and NMDBs. Core databases represent mutations from many genes but with only limited detail (frequently referred to as ‘mile wide and inch deep’ [43]) and so they are symbolized as the broad foundations of the temple. LSDBs provide extensive detail but only for a few genes (often referred to as ‘inch wide and mile deep’ [43,44]), and hence they are symbolized as the tall and narrow columns of the temple (A–O). NMDBs provide a layer on top of LSDBs (i.e. the roof of the temple), in that they specify population-specific details for alterations in many different genes; the height of each roof is indicative of the depth of recorded genetic diversity for a given population or ethnic group. www.sciencedirect.com 336 Opinion TRENDS in Genetics Vol.21 No.6 June 2005 cooperation and open data sharing as was so fruitful in the public effort to sequence the human genome. Principal data components will be phenotype descriptions and information on genome–phenome relationships. There will be many sources of these data, including results from research into mendelian diseases, data from animal model studies, observations from genetic association studies, mutation findings from molecular diagnostic laboratories and patient data from clinical investigations. The wealth of data published in journals will need to be incorporated into these databases or risk being lost to the medical-informatics future (although it is currently unclear how this might be achieved). Certainly, publishers could do more to encourage authors to submit their findings to suitable electronic databases. Text mining software is an active area of research [28–30], but in the foreseeable future such tools will probably only provide sufficient functionality to help curators manually extract literature data (e.g. by finding and filtering publications), rather than accomplish this task without human intervention. For these reasons, the phenotype challenge we face is significant, and perhaps an international ‘Human Phenome Project’ [31] should now be organized as a followon to the sequencing phase of the Human Genome Project. Patient databases: taking phenotype databases one step further The construction of depositories with phenotype information keyed to many (or even all) individuals in a population could be considered the ultimate phenotype database. Certainly, when whole-genome sequencing becomes routine and personalized medicine is common, then ‘patient databases’ might be something we take for granted. But that is some way in the future. In the meantime, the first efforts in that direction have been launched, driven by population-wide epidemiological projects initiated in various countries, such as Iceland, Estonia and the UK ([32] and references therein). These early endeavors focus mainly on common disorders rather than mendelian disease, and primarily intend to capture functional relationships between disease phenotypes and the underlying polymorphisms, genotypes and epigenotypes [33]. In time, they must be made relevant to monogenic disorders, because variable penetrance of even mendelian disease can be ascribed to modifier genes and to genomic variation [34] that pre-existed within the founding population of contemporary humans [35–36]. For example, the sickle cell Cd6 (A/T) and the b-thalassemia thalassemia major Cd39 (C/T) mutations are found to exist in five and nine different haplotype backgrounds, respectively, accounting for much of the phenotype variability observed in affected individuals [37–38]. The same phenomenon is seen for the IVS I-110 (G/A) and IVS I-6 (T/C) b-thalassemia mutations ([39]; G.P. Patrinos, unpublished). Thus, a complete understanding of human disease genetics will require co-analysis of mendelian and common disease causation. Patient databases raise particularly complex ethical challenges that demand careful attention. Primarily, the inclusion of clinical and molecular data connected to specific individuals must be performed in a way that www.sciencedirect.com ensures anonymity. How best to achieve this has not yet been established, but it is widely agreed that strict governance frameworks must be established to address all confidentiality concerns. Other issues that need to be considered include copyright and intellectual-property protection, the nature of informed consent, data-access rights, inferential relationships and so on. There are no universally agreed solutions to these problems, but these issues must be resolved if patient databases and personalized medicine are to advance substantially. Future considerations As summarized above, database activities relating to DNA and disease data are disastrously deficient. They are not doing service to the wealth of data being generated or the insights into disease mechanisms that could be gleaned if this information were properly managed. But as the saying goes ‘necessity is the mother of invention’ [40], and so this situation will have to change, and arguably has already begun to do so. For example, the Human Genome Variation Society (http://www.hgvs.org/) has been established to support LSDBs and the related community, and a consortium of model-organism-database groups (http:// www.gmod.org/) has come together to work on the phenotype database challenge [41]. The Human Genome Variation Database (HGVbase: http://hgvbase.cgb.ki.se/) [11–12] is being evolved into a genotype–phenotype database, and to this end has formed the ‘Phenofocus’ network (http://www.phenofocus.net/) to provide a discussion forum and contact list of interested parties. These and other projects have each progressed to varying degrees towards creating phenotype data models. For example, the HGVbase team along with the Japanese Biological Informatics Consortium have together led a multi-group international effort to propose a global standard-data model for genome sequence polymorphism [Polymorphism Markup Language (PML); http://pml.ddbj.nig.ac.jp/], and their preliminary model for phenotype data and for genetic association findings will subsequently be offered for incorporation into PML. On a purely technical level, one of the most urgent goals must be the creation of a powerful means for computationally representing phenotype data. An optimal design for this would be: (i) Standardized and widely accepted: just as the FASTA format made it possible for all DNA sequence depositories and analysis tools to interact easily, so must widely accepted standard data models and exchange formats be developed for phenotype data. (ii) Ontology based: use of standard terms and meanings, as made available by the open biological ontology website (http://obo.sourceforge.net/), will be important for effective data integration and analysis. (iii) Flexible: systems will need to manage diverse phenotypes that might concern molecular, organelle, cellular, organ-system or whole-organism features, with any level of detail. (iv) Scalable: uses will range from small-scale studies to major databases, but core data models must be equally useable for these different applications. Opinion TRENDS in Genetics Vol.21 No.6 June 2005 (v) Adaptable: with time and improved knowledge, phenotypes evolve (e.g. the range of tests performed for a disease can change), implying a need for versioning. (vi) Not dependent on the genome: the phenotype should be a stand-alone component, because our knowledge of its DNA basis will probably improve with time, and the genome sequence is not yet definitive. Beyond the technical challenges, it will be even more difficult to overcome the problems associated with the way database research is organized, motivated and rewarded. For example, forming consensus opinions and truly committed consortia to create standards is not easy in the highly competitive world of science. This might explain in part why leading bioinformatics activities today are often conducted in large specialized centers (e.g. the European Bioinformatics Institute and the United States National Center for Biotechnology Information) where the political influence and critical mass is such that what they produce automatically becomes the de facto standard. These groups, however, cannot build all of the necessary LSDBs, core databases, NMDBs and genotype–phenotype databases that are needed, but they could help others (biological domain experts) to build them and then integrate their efforts [42]. This kind of distributed and coordinated effort would also, ideally, be managed in close partnership with specialized journals [18] to ensure that contributors also have a means to publish their efforts. Concluding remarks The most fundamental hurdle of all that hinders progress in the mutation database field is limited funding. Because of this, almost all mutation databases that currently exist have been built by researchers ‘on the side’ for their own use, with a small degree of corporate sponsorship at best. To advance beyond this ‘cottage industry’ state of affairs, projects need to be increased in scale, quality and durability, and this can only happen if strategically minded funding agencies make available substantial targeted funds. The new databases that emerge will then need to find support for general maintenance and ongoing development. To provide this, the projects could perhaps be run as self-sustaining ‘businesses’ that charge the data suppliers or the data users (equivalent to how scientific journals work today). It might also be possible to develop novel forms of joint academic–corporate funding. The funding challenge is thus far from simple: but it might help the debate to note that funding agencies invest vast sums of money to create primary mutation data, but they then fail to direct sufficient funds to ensure that these data flow effectively to clinicians and scientists involved in disease research and patient care. This situation deserves to be remedied. Acknowledgements We thank Heikki Lehvaslaiho and Raymond Dalgleish for critical reading of this manuscript. Our research is supported by the European Commission (FP6, IST thematic area) through the INFOBIOMED NoE (IST-507585). www.sciencedirect.com 337 References 1 McKusick, V.A. (1966) Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders, (1st edn.), Johns Hopkins University Press 2 Hamosh, A. et al. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52–55 3 Cotton, R.G. et al. (1998) The HUGO mutation database initiative. Science 279, 10–11 4 Stenson, P.D. et al. (2003) Human gene mutation database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 5 Beroud, C. (2005) The use of mutation databases in molecular diagnostics. In Molecular Diagnostics (Patrinos, G.P. and Ansorge, W., eds), Elsevier (in press) 6 Hardison, R.C. et al. (2002) HbVar: A relational database of human hemoglobin variants and thalassemia mutations at the globin gene server. Hum. Mutat. 19, 225–233 7 Patrinos, G.P. et al. (2004) Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies. Nucleic Acids Res. 32 (Database issue), D537–D541 8 Claustres, M. et al. (2002) Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases. Genome Res. 12, 680–688 9 Wheeler, D.L. et al. (2004) Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32 (Database issue), D35–D40 10 The International HapMap Consortium. (2003) The international HapMap project. Nature 426, 789–796 11 Fredman, D. et al. (2004) HGVbase: a curated resource describing human DNA variation and phenotype relationships. Nucleic Acids Res. 32(Database issue), D516–D519 12 Fredman, D. et al. (2002) HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res. 30, 387–391 13 Marsh, S. et al. (2002) SNP databases and pharmacogenetics: great start, but a long way to go. Hum. Mutat. 20, 174–179 14 Aerts, J. et al. (2002) Data mining of public SNP databases for the selection of intragenic SNPs. Hum. Mutat. 20, 162–173 15 Dvornyk, V. et al. (2004) Current limitations of SNP data from the public domain for studies of complex disorders: a test for ten candidate genes for obesity and osteoporosis. BMC Genet. 5, 4 16 Brown, A.F. and McKie, M.A. (2000) MuStaR and other software for locus-specific mutation databases. Hum. Mutat. 15, 76–85 17 Patrinos, G.P. et al. (2005) Hellenic National Mutation Database: a prototype database for mutations leading to inherited disorders in the Hellenic population. Hum. Mutat. 25, 327–333 18 Patrinos, G.P. and Wajcman, H. (2004) Recording human globin gene variation. Hemoglobin 28, v-vii 19 Horaitis, O. and Cotton, R.G. (2004) The challenge of documenting mutation across the genome: the human genome variation society approach. Hum. Mutat. 23, 447–452 20 Sipila, K. and Aula, P. (2002) Database for the mutations of the Finnish disease heritage. Hum. Mutat. 19, 16–22 21 Teebi, A.S. et al. (2002) Arab genetic disease database (AGDDB): a population-specific clinical and mutation database. Hum. Mutat. 19, 615–621 22 Gerlai, R. (2002) Phenomics: fiction or the future? Trends Neurosci. 25, 506–509 23 Scriver, C.R. (2004) After the genome - the phenome? J. Inherit. Metab. Dis. 27, 305–317 24 Hall, J.G. (2003) A clinician’s plea. Nat. Genet. 33, 440–442 25 Kahraman, A. et al. (2005) PhenomicDB: a multi-species genotype/ phenotype database for comparative phenomics. Bioinformatics 21, 418–420 26 Licinio, J. (2004) PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Pharmacogenomics J. 4, 1 27 Hewett, M. et al. (2002) PharmGKB: the pharmacogenetics knowledge base. Nucleic Acids Res. 30, 163–165 28 Zhou, G. et al. (2004) Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20, 1178–1190 29 Chiang, J.H. et al. (2004) GIS: a biomedical text-mining system for gene information discovery. Bioinformatics 20, 120–121 338 Opinion TRENDS in Genetics Vol.21 No.6 June 2005 30 Hirschman, L. et al. (2002) Accomplishments and challenges in literature data mining for biology. Bioinformatics 18, 1553–1561 31 Freimer, N. and Sabatti, C. (2003) The human phenome project. Nat. Genet. 34, 15–21 32 Kaiser, J. (2002) Biobanks. Population databases boom, from Iceland to the U.S. Science 298, 1158–1161 33 Bjornsson, H.T. et al. (2004) An integrated epigenetic and genetic approach to common human disease. Trends Genet. 20, 350–358 34 Reich, D.E. and Lander, E.S. (2001) On the allelic spectrum of human disease. Trends Genet. 17, 502–510 35 Collins, A. et al. (1999) Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl. Acad. Sci. U. S. A. 96, 15173–15177 36 Lander, E.S. (1996) The new genomics: global views of biology. Science 274, 536–539 37 Labie, D. et al. (1985) Common haplotype dependency of high G gamma-globin gene expression and high Hb F levels in betathalassemia and sickle cell anemia patients. Proc. Natl. Acad. Sci. U. S. A. 82, 2111–2114 38 Pirastu, M. et al. (1987) The same beta-globin gene mutation is present on nine different beta-thalassemia chromosomes in a Sardinian population. Proc. Natl. Acad. Sci. U. S. A. 84, 2882–2885 39 Patrinos, G.P. et al. (2001) Agamma-haplotypes: a new group of genetic markers for thalassemic mutations inside the 5 0 regulatory region of the human Agamma-globin gene. Am. J. Hematol. 66, 99–104 40 Plato (360 BC) The Republic 41 Stein, L.D. et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 42 Stein, L. (2002) Creating a bioinformatics nation. Nature 417, 119–120 43 Anonymous. (1999) Newspaper and the Internet: caught in the web. Economist 352, 17–19 44 Scriver, C.R. et al. (2000) PAHdb: a locus-specific knowledgebase. Hum. Mutat. 15, 99–104 Articles of interest in Trends and Current Opinion journals Dendritic-cell-based therapeutic vaccination against cancer Frank O. Nestle, Arpad Farkas and Curdin Conrad Current Opinion in Immunology, 17, 163–169 Lessons from DNA microarray analysis: the gene expression profile of biofilms Beth A. Lazazzera Current Opinion in Microbiology, 8, 222–227 When X-rays modify the protein structure: radiation damage at work Oliviero Carugo and Kristina Djinović Carugo Trends in Biochemical Sciences, 30, 213–219 Pharmaceuticals: a threat to drinking water? Oliver A. Jones, John N. Lester and Nick Voulvoulis Trends in Biotechnology, 23, 163–167 Environmental microbiology-on-a-chip and its future impacts Wen-Tso Liu and Liang Zhu Trends in Biotechnology, 23, 174–179 Immuno-PCR: high sensitivity detection of proteins by nucleic acid amplification Christof M. Niemeyer, Michael Adler and Ron Wacker Trends in Biotechnology, 23, 208–216 The avian genome uncovered Hans Ellegren Trends in Ecology and Evolution, 20, 180–186 Tackling the population genetics of clonal and partially clonal organisms Fabien Halkett, Jean-Christophe Simon and François Balloux Trends in Ecology and Evolution, 20, 194–201 Monogenic low renin hypertension Maria I. New, David S. Geller, Francesco Fallo and Robert C. Wilson Trends in Endocrinology and Metabolism, 16, 92–97 Genetics of human hypertension Anupam Agarwal, Gordon H. Williams and Naomi D.L. Fisher Trends in Endocrinology and Metabolism, 16, 127–133 MHC polymorphism: AIDS susceptibility in non-human primates Ronald E. Bontrop and David I. Watkins Trends in Immunology, 26, 227–233 Mechanisms of cell death in rhodopsin retinitis pigmentosa: implications for therapy Hugo F. Mendes, Jacqueline van der Spuy, J. Paul Chapple and Michael E. Cheetham Trends in Molecular Medicine, 11, 177–185 www.sciencedirect.com