Download DNA, diseases and databases: disastrously deficient

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Transcript
Opinion
TRENDS in Genetics Vol.21 No.6 June 2005
DNA, diseases and databases:
disastrously deficient
George P. Patrinos1 and Anthony J. Brookes2
1
Erasmus University Medical Center, Faculty of Medicine and Health Sciences, MGC-Department of Cell Biology and Genetics,
PO Box 1738, 3000 DR, Rotterdam, The Netherlands
2
Department of Genetics, University of Leicester, University Road, Leicester, UK LE1 7RH
Recent progress in disease genetics and genome-related
medicine has been substantial, with vast amounts of
data being generated. However, this progress has not
been matched by adequate database projects that
gather and organize these data to enable their useful
exploitation. This research area is complex, entailing
core databases, locus-specific databases, national
mutation databases, genotype–phenotype databases
and patient databases – and much work is required to
develop and properly integrate these various resources.
To promote this, we present a timely overview of the
field, emphasize its over-riding importance and discuss
the disastrously deficient progress made so far. Many
factors
contribute
to
this
slow
progress
(e.g. technological hurdles, publication requirements,
the short-sighted and popularist research system). A
lack of targeted funding is arguably the most fundamental problem, but one that can be solved.
Introduction
Research into the genetic basis of disease has advanced in
scale and sophistication, leading to increased rates of data
production in many laboratories. Additionally, DNA
diagnostics and electronic healthcare records are increasingly common features of medical practice. Therefore, it
should be possible to integrate all of this information to
establish a detailed understanding of how genetic differences impact human health. Nevertheless, current progress towards this goal is slow. This is primarily due to the
many challenges involved in computationally handling
(gathering, exchanging, integrating and interpreting) the
relevant primary information. These challenges are often
technical in nature, but they also include restricted
funding for this kind of research and an intrinsic research
bias towards new data acquisition rather than old data
management. Valuable discoveries that address the
genetic basis of disease are therefore being squandered,
and this can only be remedied by a significant enhancement of ‘mutation database’ and related activities.
In this article, we highlight some of the main activities
relating to mutation databases to: (i) describe the existing
and emerging types of database in this domain;
(ii) emphasize their potential applications in modern
Corresponding author: Patrinos, G.P. ([email protected]).
Available online 16 April 2005
medical genetics; and (iii) comment on the key elements
that are still missing and holding back the field.
Types of mutation databases
With great vision, Victor McKusick made the first serious
efforts towards summarizing DNA variations and their
clinical consequences when he published the Mendelian
Inheritance in Man (MIM) – a paper compendium of
information on genetic disorders and genes [1]. This is now
distributed electronically [Online Mendelian Inheritance
in Man (OMIM)] by the National Center for Biotechnology
Information (NCBI) and updated on a daily basis (http://
www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbZOMIM&
itoolZtoolbar) [2]. Its strength stems from the quality
and diversity of its content, but its structure is far
from ideal for automated data mining. It is also not
comprehensive and could not be expected to be, given
the ever-accelerating pace of discovery of genetic
mutations involved in human disease and modelorganism phenotypes. Instead, to cope with this
large and growing body of information there is a
need for a range of diverse and suitably integrated
databases. An awareness of this need prompted the
formation of the Human Genome Organization
Mutation Database Initiative (HUGO-MDI) in the
early 1990s [3], which then evolved into the Human Genome
Variation Society (HGVS: http://www.hgvs.org). Today, the
stated objective of the HGVS is ‘.to foster discovery and
characterization of genomic variations, including population distribution and phenotypic associations’.
More broadly, the various depositories that fall under
the banner of ‘mutation databases’ can be categorized into
two types: core (or central) databases and locus-specific
databases (LSDBs). Some examples are given in Box 1.
The philosophy behind core databases is an attempt to
capture all described mutations in all genes, but with each
mutation being represented in limited detail. The included
phenotype descriptions are generally cursory, making core
databases of little value for those wishing to understand
the subtleties of phenotypic variability. Core databases
tend to include only mutations of large effect that result in
mendelian patterns of inheritance, whereas sequence
variations not associated with any clinical consequences
or those associated with minor or uncertain clinical
consequences are rarely catalogued. Thus, core database
provide a good overview of patterns of clinically relevant
www.sciencedirect.com 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2005.04.004
334
Opinion
TRENDS in Genetics Vol.21 No.6 June 2005
Box 1. Examples of useful mutation and related databases
† Online Mendelian Inheritance in Man (OMIMe)
A comprehensive, authoritative and timely knowledgebase of
human genes and genetic disorders compiled to support research
and education in human genomics and the practice of clinical
genetics. Each OMIM entry has a full-text summary of a genetically
determined phenotype and/or gene and has numerous links to other
genetic databases such as DNA and protein sequence, references,
general and locus-specific mutation databases (http://www.ncbi.
nlm.nih.gov/entrez/query.fcgi?dbZOMIManditoolZtoolbar).
† Human Gene Mutation Database (HGMD)
A database recording various types of mutation within the coding
regions of human nuclear genes causing inherited disease. HGMD
does not usually include mutations lacking obvious phenotypic
consequences. Data are collected weekly by a combination of manual
and computerized search procedures (http://www.hgmd.org).
† SNP databases
Repositories of single nucleotide polymorphisms (SNPs) and other
localized variations across the genome. The most comprehensive
SNP database is dbSNP at NCBI (http://www.ncbi.nlm.nih.gov/
projects/SNP/); additional information is available at other sites
such as the Human Genome Variation Database (http://hgvbase.cgb.
ki.se/) and the International HapMap Project website (http://www.
hapmap.org). Some projects have a specific research focus, such as
the environmental genome SNP database (egSNP: https://dir-apps.
niehs.nih.gov/egsnp/home.htm), which lists common SNPs in
selected environmental-response genes. An extensive list of the
SNP databases is provided at http://www.genomic.unimelb.edu.au/
mdi/dblist/dblist.html.
† PharmGKB
A publicly available knowledgebase that consists of a central
repository for clinical and genetic information that aids researchers
in understanding how genetic variation among individuals contributes to differences in reactions to drugs (http://www.pharmgkb.org/).
† PhenomicDB
A multi-species genotype–phenotype database, merging public
genotype and phenotype data from a wide range of human and
other model organisms. Its user interface enables scientists to
compare and browse known phenotypes simultaneously for a given
gene or a set of genes from different organisms (http://www.
phenomicDB.de).
mutations and polymorphisms, but almost no fine detail to
aid proper understanding. The best current example of a
core database is the Human Gene Mutation Database
(http://www.hgmd.org) [4], which by March 2005 contained O45 000 different lesions in almost 1 800 different
nuclear genes, with new entries accumulating at an
average rate of 2 300 per annum.
By contrast, LSDBs contain information about one or a
few specific genes [5], usually related to a single disease.
They are highly curated repositories of published and
unpublished mutations within those genes and, as such,
they complement the core databases. Data quality and
completeness are typically good, with up to 50% of stored
records pertaining to unpublished mutations. The data
are also rich and informative. For example, LSDBs will
typically present each of the multiple discoveries of
recurrent mutational events, enabling mutation hotspots
to be identified; when these mutations occur on different
chromosomal backgrounds (linked to other mutations)
such that they result in several, or different, disease
www.sciencedirect.com
features, these correlations are also recorded. A good
example of an LSDB is HbVar (http://globin.cse.psu.edu/
globin/hbvar) [6] – a relational database of hemoglobin
variants and thalassemia mutations, providing information on pathology, hematology, clinical presentation
and laboratory findings for numerous DNA alterations.
Gene and protein variants are annotated with respect to
biochemical data, analytical techniques, structure, stability, function, literature references, and qualitative and
quantitative distribution in ethnic groups and geographic
locations [7]. As is common in LSDBs, entries can be
accessed through summary listings or user-generated
queries that can be highly specific. For information on O
350 currently available LSDBs see http://www.hgvs.org,
http://www.hgmd.org and Ref. [8].
In addition to core databases and LSDBs, DNA
variation is also recorded in various polymorphism
databases, such as the single nucleotide polymorphism
database (dbSNP; http://www.ncbi.nlm.nih.gov/projects/
SNP/) [9], the HAPMAP Data Coordination Center
(http://www.hapmap.org/) [10] and the Human Genome
Variation Database (HGVbase; http://hgvbase.cgb.ki.se)
[11–12]. These resources make no explicit attempt to
connect DNA information to phenotypes, and they are not
yet perfect in design or content [13–15], but they do make
available an extensive list of ‘normal’ variation that occurs
in the human genome. These databases are important
because they help to complete the picture for any gene or
region of interest, by summarizing all of the neutral
variants that are typically not included in core databases
or LSDBs.
Core databases and LSDBs thus share the same
primary purpose of representing DNA variations that
have a definitive or a probable phenotypic effect. The
current databases are, however, too limited in number and
in their degree of inter-connection to capture all of the
information about pathogenic DNA mutations. The is
because the modern research ethos fails to provide
adequate incentives (i.e. publication options, peer recognition and funding) to encourage researchers to build new
core databases or LSDBs. Initiatives designed to make it
technically simple to set up and use such databases are
welcomed, such as specialized software [16,17] or interactive user interfaces (e.g. Genewindow; http://genewindow.nci.nih.gov), as are those that directly transfer data
from clinical diagnostics laboratories into these depositories [18] (http://dmudb.org/); but these initiatives will
not change the fundamental problem. Instead, the
biomedical community must first appreciate the overwhelming need for improved mutation database systems
to begin to solve this problem.
National mutation databases: a new trend
The spectrum of mutations observed for any gene or
disease will often differ between population groups, and
also between distinct ethnic groups within a geographical
region. This is an important extra dimension to consider
when building mutation depositories, and it is reflected in
the emergence of several new national mutation databases (NMDBs) [19]. Not only do NMDBs help to elaborate
the demographic history of human population groups,
Opinion
TRENDS in Genetics Vol.21 No.6 June 2005
they are also a prerequisite to the optimization of national
DNA diagnostic services. That is, they will provide
essential reference information for use in the design of
targeted mutation-detection efforts for clinical use, and
they might also serve to enhance awareness among
healthcare professionals, bio-scientists, patients and the
public about the range of common genetic disorders (and
their environmental correlates) suffered by particular
population groups.
Two of the first online NMDBs are a Finnish database
(http://www.findis.org) [20] and an Arabian database
(http://www.agddb.org) [21]. Although rich in information,
these particular resources unfortunately provide limited
query capacity, particularly for allelic frequencies. They
will, nevertheless, continue to develop user-friendly
designs, offering good querying capacity and extensive
expert data curation. The Hellenic and Cypriot NMDBs
(available at http://www.goldenhelix.org/hellenic and
http://www.goldenhelix.org/cypriot, respectively) are aiming higher, by introducing a specialized database management software (ETHNOS) that enables both compound
query formulation and restricted-access data entry so that
all records are manually curated to ensure good and
consistent data quality [17].
To maximize the utility of NMDBs, the way in which
their content is provided needs to ensure a seamless
integration with related content in LSDBs and core
databases. This is conceptually illustrated in Figure 1.
Furthermore, extensive links to other external information (e.g. to OMIM and to various types of genome
sequence annotation) would ideally be provided to connect
NMDBs to the growing network of genomic databases.
335
and/or with little detail. This situation needs to be
improved, and the comprehensive analysis of phenotypes
is required – a goal termed ‘phenomics’ [22–24]. Informatics solutions that support phenomics must be
developed.
Bioinformatics for the phenomics era will have to solve
new problems. Although it is relatively easy to create
databases for ‘uni-dimensional’ DNA sequence information
(comprising merely a four-letter code), devising generic data
and database models for the ‘multi-dimensional’ boundless
universe of diverse phenotypes remains a challenge.
However, no major public database exists that currently
presents extensive and sophisticated genotype–phenotype
connections. OMIM remains the best contender in this
category, although the PhenomicDB is a recent and
innovative project, which, by means of orthologous gene
relationships, aligns OMIM data with model organism data
in a single database (http://www.phenomicDB.de) [25]. Also
worthy of note is the PharmGKB project, which focuses on
pharmacogenetics (http://www.pharmgkb.org/) [26–27].
So what will genotype–phenotype databases of the
future need to achieve? They must aspire to be much more
than multi-disease LSDBs or core mutation databases.
Because their phenotype data content will be so diverse,
computational exploitation of the data will not be possible
with anything as simple as sequence similarity searches.
Instead, sophisticated and intricate phenotype data
models will be required to empower computational
analyses, and these solutions will have to make rigorous
use of extensive phenotype ontologies. Ultimately genotype–phenotype databases will need to provide the full
range of ‘omics’ data (e.g. transcriptomics, proteomics and
metabolomics) that mechanistically connect genotype
differences to phenotype consequences. Initially however,
such an all-encompassing ‘systems biology’ approach
might be too daunting, and projects will probably first
concentrate on tying DNA changes directly to phenotypes.
Achieving even this, given the enormous amount of data
now being generated, will depend on the creation of
systematic and standardized ways to manage phenotype
data, and this by itself will require good international
Genotype–phenotype databases: a look at the future
Excellent progress has been made in constructing databases of primary DNA information (i.e. genome sequence
and polymorphism) and, as described previously, there has
been some progress in creating mutation databases to
catalog DNA alterations that have a phenotypic effect.
However, the phenotype data in these resources are
presented in a basic way, for example, in free text entries
Population I
A
B
C
D
E
F
Population II
G
H
I
J
K
L
M
N
O
Central database
TRENDS in Genetics
Figure 1. Relationships between various types of mutation databases. This depiction is borrowed from electronic commerce, and it uses the concept of an ancient temple to
illustrate the fruitful synthesis of core, LSDBs and NMDBs. Core databases represent mutations from many genes but with only limited detail (frequently referred to as ‘mile
wide and inch deep’ [43]) and so they are symbolized as the broad foundations of the temple. LSDBs provide extensive detail but only for a few genes (often referred to as
‘inch wide and mile deep’ [43,44]), and hence they are symbolized as the tall and narrow columns of the temple (A–O). NMDBs provide a layer on top of LSDBs (i.e. the roof of
the temple), in that they specify population-specific details for alterations in many different genes; the height of each roof is indicative of the depth of recorded genetic
diversity for a given population or ethnic group.
www.sciencedirect.com
336
Opinion
TRENDS in Genetics Vol.21 No.6 June 2005
cooperation and open data sharing as was so fruitful in the
public effort to sequence the human genome.
Principal data components will be phenotype descriptions and information on genome–phenome relationships.
There will be many sources of these data, including results
from research into mendelian diseases, data from animal
model studies, observations from genetic association
studies, mutation findings from molecular diagnostic
laboratories and patient data from clinical investigations.
The wealth of data published in journals will need to be
incorporated into these databases or risk being lost to the
medical-informatics future (although it is currently
unclear how this might be achieved). Certainly, publishers
could do more to encourage authors to submit their
findings to suitable electronic databases. Text mining
software is an active area of research [28–30], but in the
foreseeable future such tools will probably only provide
sufficient functionality to help curators manually extract
literature data (e.g. by finding and filtering publications),
rather than accomplish this task without human intervention. For these reasons, the phenotype challenge we
face is significant, and perhaps an international ‘Human
Phenome Project’ [31] should now be organized as a followon to the sequencing phase of the Human Genome Project.
Patient databases: taking phenotype databases one step
further
The construction of depositories with phenotype information keyed to many (or even all) individuals in a
population could be considered the ultimate phenotype
database. Certainly, when whole-genome sequencing
becomes routine and personalized medicine is common,
then ‘patient databases’ might be something we take for
granted. But that is some way in the future. In the
meantime, the first efforts in that direction have been
launched, driven by population-wide epidemiological
projects initiated in various countries, such as Iceland,
Estonia and the UK ([32] and references therein). These
early endeavors focus mainly on common disorders rather
than mendelian disease, and primarily intend to capture
functional relationships between disease phenotypes and
the underlying polymorphisms, genotypes and epigenotypes [33]. In time, they must be made relevant to
monogenic disorders, because variable penetrance of even
mendelian disease can be ascribed to modifier genes and to
genomic variation [34] that pre-existed within the founding population of contemporary humans [35–36]. For
example, the sickle cell Cd6 (A/T) and the b-thalassemia
thalassemia major Cd39 (C/T) mutations are found to
exist in five and nine different haplotype backgrounds,
respectively, accounting for much of the phenotype
variability observed in affected individuals [37–38]. The
same phenomenon is seen for the IVS I-110 (G/A) and
IVS I-6 (T/C) b-thalassemia mutations ([39]; G.P.
Patrinos, unpublished). Thus, a complete understanding
of human disease genetics will require co-analysis of
mendelian and common disease causation.
Patient databases raise particularly complex ethical
challenges that demand careful attention. Primarily, the
inclusion of clinical and molecular data connected to
specific individuals must be performed in a way that
www.sciencedirect.com
ensures anonymity. How best to achieve this has not yet
been established, but it is widely agreed that strict
governance frameworks must be established to address
all confidentiality concerns. Other issues that need to be
considered include copyright and intellectual-property
protection, the nature of informed consent, data-access
rights, inferential relationships and so on. There are no
universally agreed solutions to these problems, but these
issues must be resolved if patient databases and personalized medicine are to advance substantially.
Future considerations
As summarized above, database activities relating to DNA
and disease data are disastrously deficient. They are not
doing service to the wealth of data being generated or the
insights into disease mechanisms that could be gleaned if
this information were properly managed. But as the
saying goes ‘necessity is the mother of invention’ [40],
and so this situation will have to change, and arguably has
already begun to do so. For example, the Human Genome
Variation Society (http://www.hgvs.org/) has been established to support LSDBs and the related community, and a
consortium of model-organism-database groups (http://
www.gmod.org/) has come together to work on the
phenotype database challenge [41]. The Human Genome
Variation Database (HGVbase: http://hgvbase.cgb.ki.se/)
[11–12] is being evolved into a genotype–phenotype
database, and to this end has formed the ‘Phenofocus’
network (http://www.phenofocus.net/) to provide a discussion forum and contact list of interested parties. These and
other projects have each progressed to varying degrees
towards creating phenotype data models. For example, the
HGVbase team along with the Japanese Biological
Informatics Consortium have together led a multi-group
international effort to propose a global standard-data
model for genome sequence polymorphism [Polymorphism
Markup Language (PML); http://pml.ddbj.nig.ac.jp/], and
their preliminary model for phenotype data and for
genetic association findings will subsequently be offered
for incorporation into PML.
On a purely technical level, one of the most urgent goals
must be the creation of a powerful means for computationally representing phenotype data. An optimal design
for this would be:
(i) Standardized and widely accepted: just as the
FASTA format made it possible for all DNA sequence
depositories and analysis tools to interact easily, so
must widely accepted standard data models and
exchange formats be developed for phenotype data.
(ii) Ontology based: use of standard terms and meanings, as made available by the open biological ontology
website (http://obo.sourceforge.net/), will be important
for effective data integration and analysis.
(iii) Flexible: systems will need to manage diverse
phenotypes that might concern molecular, organelle,
cellular, organ-system or whole-organism features,
with any level of detail.
(iv) Scalable: uses will range from small-scale studies to
major databases, but core data models must be equally
useable for these different applications.
Opinion
TRENDS in Genetics Vol.21 No.6 June 2005
(v) Adaptable: with time and improved knowledge,
phenotypes evolve (e.g. the range of tests performed for
a disease can change), implying a need for versioning.
(vi) Not dependent on the genome: the phenotype
should be a stand-alone component, because our knowledge of its DNA basis will probably improve with time,
and the genome sequence is not yet definitive.
Beyond the technical challenges, it will be even more
difficult to overcome the problems associated with the way
database research is organized, motivated and rewarded.
For example, forming consensus opinions and truly
committed consortia to create standards is not easy in
the highly competitive world of science. This might
explain in part why leading bioinformatics activities
today are often conducted in large specialized centers
(e.g. the European Bioinformatics Institute and the
United States National Center for Biotechnology Information) where the political influence and critical mass is
such that what they produce automatically becomes the
de facto standard. These groups, however, cannot build all
of the necessary LSDBs, core databases, NMDBs and
genotype–phenotype databases that are needed, but they
could help others (biological domain experts) to build them
and then integrate their efforts [42]. This kind of
distributed and coordinated effort would also, ideally, be
managed in close partnership with specialized journals
[18] to ensure that contributors also have a means to
publish their efforts.
Concluding remarks
The most fundamental hurdle of all that hinders progress
in the mutation database field is limited funding. Because
of this, almost all mutation databases that currently exist
have been built by researchers ‘on the side’ for their own
use, with a small degree of corporate sponsorship at best.
To advance beyond this ‘cottage industry’ state of affairs,
projects need to be increased in scale, quality and
durability, and this can only happen if strategically
minded funding agencies make available substantial
targeted funds. The new databases that emerge will
then need to find support for general maintenance and
ongoing development. To provide this, the projects could
perhaps be run as self-sustaining ‘businesses’ that charge
the data suppliers or the data users (equivalent to how
scientific journals work today). It might also be possible to
develop novel forms of joint academic–corporate funding.
The funding challenge is thus far from simple: but it might
help the debate to note that funding agencies invest vast
sums of money to create primary mutation data, but they
then fail to direct sufficient funds to ensure that these data
flow effectively to clinicians and scientists involved in
disease research and patient care. This situation deserves
to be remedied.
Acknowledgements
We thank Heikki Lehvaslaiho and Raymond Dalgleish for critical reading
of this manuscript. Our research is supported by the European
Commission (FP6, IST thematic area) through the INFOBIOMED NoE
(IST-507585).
www.sciencedirect.com
337
References
1 McKusick, V.A. (1966) Mendelian Inheritance in Man. A Catalog of
Human Genes and Genetic Disorders, (1st edn.), Johns Hopkins
University Press
2 Hamosh, A. et al. (2002) Online Mendelian Inheritance in Man
(OMIM), a knowledgebase of human genes and genetic disorders.
Nucleic Acids Res. 30, 52–55
3 Cotton, R.G. et al. (1998) The HUGO mutation database initiative.
Science 279, 10–11
4 Stenson, P.D. et al. (2003) Human gene mutation database (HGMD):
2003 update. Hum. Mutat. 21, 577–581
5 Beroud, C. (2005) The use of mutation databases in molecular
diagnostics. In Molecular Diagnostics (Patrinos, G.P. and Ansorge,
W., eds), Elsevier (in press)
6 Hardison, R.C. et al. (2002) HbVar: A relational database of human
hemoglobin variants and thalassemia mutations at the globin gene
server. Hum. Mutat. 19, 225–233
7 Patrinos, G.P. et al. (2004) Improvements in the HbVar database of
human hemoglobin variants and thalassemia mutations for population and sequence variation studies. Nucleic Acids Res. 32 (Database
issue), D537–D541
8 Claustres, M. et al. (2002) Time for a unified system of mutation
description and reporting: a review of locus-specific mutation
databases. Genome Res. 12, 680–688
9 Wheeler, D.L. et al. (2004) Database resources of the National Center
for Biotechnology Information: update. Nucleic Acids Res. 32
(Database issue), D35–D40
10 The International HapMap Consortium. (2003) The international
HapMap project. Nature 426, 789–796
11 Fredman, D. et al. (2004) HGVbase: a curated resource describing
human DNA variation and phenotype relationships. Nucleic Acids
Res. 32(Database issue), D516–D519
12 Fredman, D. et al. (2002) HGVbase: a human sequence variation
database emphasizing data quality and a broad spectrum of data
sources. Nucleic Acids Res. 30, 387–391
13 Marsh, S. et al. (2002) SNP databases and pharmacogenetics: great
start, but a long way to go. Hum. Mutat. 20, 174–179
14 Aerts, J. et al. (2002) Data mining of public SNP databases for the
selection of intragenic SNPs. Hum. Mutat. 20, 162–173
15 Dvornyk, V. et al. (2004) Current limitations of SNP data from the
public domain for studies of complex disorders: a test for ten candidate
genes for obesity and osteoporosis. BMC Genet. 5, 4
16 Brown, A.F. and McKie, M.A. (2000) MuStaR and other software for
locus-specific mutation databases. Hum. Mutat. 15, 76–85
17 Patrinos, G.P. et al. (2005) Hellenic National Mutation Database: a
prototype database for mutations leading to inherited disorders in the
Hellenic population. Hum. Mutat. 25, 327–333
18 Patrinos, G.P. and Wajcman, H. (2004) Recording human globin gene
variation. Hemoglobin 28, v-vii
19 Horaitis, O. and Cotton, R.G. (2004) The challenge of documenting
mutation across the genome: the human genome variation society
approach. Hum. Mutat. 23, 447–452
20 Sipila, K. and Aula, P. (2002) Database for the mutations of the
Finnish disease heritage. Hum. Mutat. 19, 16–22
21 Teebi, A.S. et al. (2002) Arab genetic disease database (AGDDB): a
population-specific clinical and mutation database. Hum. Mutat. 19,
615–621
22 Gerlai, R. (2002) Phenomics: fiction or the future? Trends Neurosci. 25,
506–509
23 Scriver, C.R. (2004) After the genome - the phenome? J. Inherit.
Metab. Dis. 27, 305–317
24 Hall, J.G. (2003) A clinician’s plea. Nat. Genet. 33, 440–442
25 Kahraman, A. et al. (2005) PhenomicDB: a multi-species genotype/
phenotype database for comparative phenomics. Bioinformatics 21,
418–420
26 Licinio, J. (2004) PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Pharmacogenomics J. 4, 1
27 Hewett, M. et al. (2002) PharmGKB: the pharmacogenetics knowledge
base. Nucleic Acids Res. 30, 163–165
28 Zhou, G. et al. (2004) Recognizing names in biomedical texts: a
machine learning approach. Bioinformatics 20, 1178–1190
29 Chiang, J.H. et al. (2004) GIS: a biomedical text-mining system for
gene information discovery. Bioinformatics 20, 120–121
338
Opinion
TRENDS in Genetics Vol.21 No.6 June 2005
30 Hirschman, L. et al. (2002) Accomplishments and challenges in
literature data mining for biology. Bioinformatics 18, 1553–1561
31 Freimer, N. and Sabatti, C. (2003) The human phenome project. Nat.
Genet. 34, 15–21
32 Kaiser, J. (2002) Biobanks. Population databases boom, from Iceland
to the U.S. Science 298, 1158–1161
33 Bjornsson, H.T. et al. (2004) An integrated epigenetic and genetic
approach to common human disease. Trends Genet. 20, 350–358
34 Reich, D.E. and Lander, E.S. (2001) On the allelic spectrum of human
disease. Trends Genet. 17, 502–510
35 Collins, A. et al. (1999) Genetic epidemiology of single-nucleotide
polymorphisms. Proc. Natl. Acad. Sci. U. S. A. 96, 15173–15177
36 Lander, E.S. (1996) The new genomics: global views of biology. Science
274, 536–539
37 Labie, D. et al. (1985) Common haplotype dependency of high
G gamma-globin gene expression and high Hb F levels in betathalassemia and sickle cell anemia patients. Proc. Natl. Acad. Sci.
U. S. A. 82, 2111–2114
38 Pirastu, M. et al. (1987) The same beta-globin gene mutation is
present on nine different beta-thalassemia chromosomes in a
Sardinian population. Proc. Natl. Acad. Sci. U. S. A. 84,
2882–2885
39 Patrinos, G.P. et al. (2001) Agamma-haplotypes: a new group of
genetic markers for thalassemic mutations inside the 5 0 regulatory
region of the human Agamma-globin gene. Am. J. Hematol. 66,
99–104
40 Plato (360 BC) The Republic
41 Stein, L.D. et al. (2002) The generic genome browser: a building
block for a model organism system database. Genome Res. 12,
1599–1610
42 Stein, L. (2002) Creating a bioinformatics nation. Nature 417,
119–120
43 Anonymous. (1999) Newspaper and the Internet: caught in the web.
Economist 352, 17–19
44 Scriver, C.R. et al. (2000) PAHdb: a locus-specific knowledgebase.
Hum. Mutat. 15, 99–104
Articles of interest in Trends and Current Opinion journals
Dendritic-cell-based therapeutic vaccination against cancer
Frank O. Nestle, Arpad Farkas and Curdin Conrad
Current Opinion in Immunology, 17, 163–169
Lessons from DNA microarray analysis: the gene expression profile of biofilms
Beth A. Lazazzera
Current Opinion in Microbiology, 8, 222–227
When X-rays modify the protein structure: radiation damage at work
Oliviero Carugo and Kristina Djinović Carugo
Trends in Biochemical Sciences, 30, 213–219
Pharmaceuticals: a threat to drinking water?
Oliver A. Jones, John N. Lester and Nick Voulvoulis
Trends in Biotechnology, 23, 163–167
Environmental microbiology-on-a-chip and its future impacts
Wen-Tso Liu and Liang Zhu
Trends in Biotechnology, 23, 174–179
Immuno-PCR: high sensitivity detection of proteins by nucleic acid amplification
Christof M. Niemeyer, Michael Adler and Ron Wacker
Trends in Biotechnology, 23, 208–216
The avian genome uncovered
Hans Ellegren
Trends in Ecology and Evolution, 20, 180–186
Tackling the population genetics of clonal and partially clonal organisms
Fabien Halkett, Jean-Christophe Simon and François Balloux
Trends in Ecology and Evolution, 20, 194–201
Monogenic low renin hypertension
Maria I. New, David S. Geller, Francesco Fallo and Robert C. Wilson
Trends in Endocrinology and Metabolism, 16, 92–97
Genetics of human hypertension
Anupam Agarwal, Gordon H. Williams and Naomi D.L. Fisher
Trends in Endocrinology and Metabolism, 16, 127–133
MHC polymorphism: AIDS susceptibility in non-human primates
Ronald E. Bontrop and David I. Watkins
Trends in Immunology, 26, 227–233
Mechanisms of cell death in rhodopsin retinitis pigmentosa: implications for therapy
Hugo F. Mendes, Jacqueline van der Spuy, J. Paul Chapple and Michael E. Cheetham
Trends in Molecular Medicine, 11, 177–185
www.sciencedirect.com