Download Bioinformatics - Oxford Academic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Human genome wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

Genomic library wikipedia , lookup

Helitron (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Human Genome Project wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Metagenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Book reviews
research scientist wishing to become
acquainted with the ®eld.
Simon Dear
Director of Bioinformatics Engineering
GlaxoSmithKline
Gunnels Wood Road
Stevenage, UK
References
1.
Letovsky, S. (Ed.) (1999), `Bioinformatics:
Databases and Systems', Kluwer Academic,
Dordrecht.
Bioinformatics: A Practical
Guide to the Analysis of Genes
and Proteins
(Methods of Biochemical
Analysis, 43)
Andreas D. Baxevanis and B. F.
Francis Ouellette (Editors)
2nd Edn; John Wiley & Sons, New
York; 2001; ISBN: 0 471 38390 2;
470pp. US$69.95 (pbk) US$164.95
(hbk)
As yet another indication that
bioinformatics has come of age, this is the
®rst ever second edition of a textbook in
the ®eld. And it is an excellent general
bioinformatics text and reference, perhaps
even the best currently available. A check
with Amazon revealed that there have
been many sales of the book in Maryland,
particularly Bethesda, MD. This sales blip
is not so surprising given that it would be
only slightly unfair to subtitle the book `a
guide to resources at the NCBI'. After an
introduction to bioinformatics and the
Internet, the next three chapters cover the
NCBI data model, the NCBI's Genbank
sequence database and Sequin as a method
for submitting new data to Genbank at
the NCBI. A whole chapter is given over
to NCBI's Entrez database interrogation
software with speci®c examples, a table of
Boolean syntax, and many screen-shots.
On the other hand, the existence of
EMBL and the EBI, and the DNA Data
Bank of Japan are acknowledged but only
in the most peripheral manner. SRS,
which many would argue does everything
that Entrez does but does it more
comprehensively, is not mentioned at all.
Do not let all this deter you from
appreciating the book; however, it is just
a mildly chauvinistic Not Invented Here
idiosyncrasy. Far more important than
such details as how to drive the Entrez
data-mining backhoe is the fact that this
book covers all the topics that, most of us
would agree, make up bioinformatics in
the twenty-®rst century. If your favourite
method or program is not dealt with here,
then a suitable, perhaps even better,
equivalent will be.
There is coverage of pre-genomic era
approaches, including a chapter on twosequence alignment, substitution
matrices, dotplots and homology
searching. The treatment is more than
mere mechanics and makes helpful
suggestions about how to separate
meaningful hits from false positives and
statistical artefacts. This is followed by an
exposition of protein multiple sequence
alignments by Geoff Barton. It is a bit
locked into software written by Geoff
Barton but nevertheless points out some
useful general approaches and identi®es
some common potential pitfalls. These
issues are extended and complemented
by a chapter on phylogenetic analysis by
Brinkman and Leipe. This latter chapter,
in contrast with the rest of the book
which is commendably up to date, cites
no work more recent than 1997 and I
cannot believe that nothing new has
happened in this ®eld in the last four
years. Nevertheless, it is a good
introduction to phylogeny, substitution
models and tree evaluation and includes
a gallop through available software.
Three-dimensional structure, prediction,
databases and visualisation software are
also well dealt with in two separate
chapters.
It is a tribute to the modernity of the
book that a large proportion of it is
devoted to big sequence and genomic era
problems, approaches and solutions.
Indeed, two chapters and 50 pages deal
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 4. 405±410. DECEMBER 2001
407
Book reviews
speci®cally with comparative and largescale genome analysis. The ®rst of these
chapters deals with organism-speci®c
databases and shows how clusters of
orthologous genes (COGs) and other
resources can be used to elucidate
metabolic pathways. The large-scale
genome analysis chapter deals more with
issues of expression level, primarily serial
analysis of gene expression (SAGE)
methods. For those contemplating a
large-scale sequencing project, there is a
short but intense chapter on sequence
assembly using Staden's Gap4 software.
Maps and mapping databases have their
own chapter, with a speci®c section called
`Complexities and pitfalls'. Baxevanis
considers the multiplicity of programs to
parse genes out of genomic sequence and
recommends a protocol for integrating
them into an effective strategy. Wolfsberg
and Landsman have written a nice
exposition of expressed sequence tags,
their clustering, and their relevance to
gene prediction, gene expression and
genetic variability. This chapter has a
particularly large problem set, which gives
a good ¯avour of the kind of questions
that are possible and productive. Indeed,
almost all the chapters end with a problem
set ± the answers to which reside on the
publisher's web site; which also hosts
hotlinks to all the WWW resources cited
in each chapter and most, if not all, the
®gures.
The most surprising element of the
book was the ®nal chapter, which offers a
primer on Perl programming as a means
to solving a genomics problem typical of
what bioinformatics now comprehends.
Instead of using a package such as GCG
or Genejockey to analyse one or a few
genes, molecular biologists may well want
to abstract speci®c information about, say,
all 19,000 genes from Caenorhabditis
elegans. The data deluge has resulted in a
paradigm shift. There is little agreement
about what sorts of questions or analyses
are appropriate for a recently sequenced
genome. The data cowboys are riding out
in all directions across the information
prairie, whooping and sorting, corralling
408
ideas and branding new ways of looking
at the biological world. There are no
standards, no packages out there. Perl is
here offered as the appropriate tool for
empowering open-ended user-driven
curiosity. Coincidentally, Gibas and
Jambeck's book makes the same
judgement. I have extensive experience of
computing as a foreign language: I've
taught myself Fortran and Basic and
suffered formal courses in PL/1, Pascal
and C. I think that Perl is a far more
accessible option than any of those
languages for a general programming
language. I suspect, however, that my
computer-anxious but thoughtful and
curious Head of Department will wonder
why `they' cannot write a plain text
biological interrogation language: more
like Cobol than Corba (or Perl or even
BioPerl) please.
So it's congratulations to the authors,
editors and publisher for producing a
weighty, authoritative, readable and
attractive book. The colour plate idea,
which must add to production costs, is
held over from the previous edition but is
now largely redundant because the book
is so well integrated with the web. It's so
good that it will sell many copies, even if
graduate students have to go without food
to afford it.
Andrew Lloyd
INCBI, the Irish EMBnet Node
Post-genome Informatics
Minoru Kanehisa
Oxford University Press, Oxford;
2000; ISBN 0 19 850327 X (hbk),
0 19 850326 1 (pbk); 148pp;
US$35.00 £19.95 (pbk)
This book can be de®ned as a treatise on
computational biology, while most books
currently available on this subject can be
considered as handbooks. The existence of
a treatise on bioinformatics suggests that,
owing to genome sequencing projects and
all the computational aspects involved in
& HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 2. NO 4. 405±410. DECEMBER 2001