Download Bioinformatics: Bringing it all together

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia , lookup

Ancestry.com wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Public health genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome editing wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome evolution wikipedia , lookup

Pathogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Transcript
Bioinformatics: Bringing it all together
Vol. 419, No. 6908 (17 October 2002)
Forget test tubes, petri dishes and pipettes. One of the few pieces of equipment
that can be honestly labelled ubiquitous in biology today is the computer.
Bioinformatics — the development and application of computational tools to
acquire, store, organize, archive, analyse and visualize biological data — is one of
biology's fastest-growing technologies.
Marina Chicurel is a science writer based in Santa Cruz.
Bioinformatics: Bringing it all together
technology feature
751
MARINA CHICUREL
doi:10.1038/419751a
|Full text | PDF(194 K) |
Genome analysis at your fingertips
751
MARINA CHICUREL
doi:10.1038/419751b
|Full text| PDF(194 K) |
Putting a name on it
755
MARINA CHICUREL
doi:10.1038/419755a
|Full text | PDF (139K) |
table of suppliers
759
doi:10.1038/419759a
|Full text |PDF (35 K)|
17 October 2002
Nature 419, 751 - 757 (2002); doi:10.1038/419751a
<>
Bioinformatics: Bringing it all together technology
feature
MARINA CHICUREL
Marina Chicurel is a science writer based in Santa Cruz.
Forget test tubes, petri dishes and pipettes. One of the few pieces of equipment that can be
honestly labelled ubiquitous in biology today is the computer. Bioinformatics — the
development and application of computational tools to acquire, store, organize, archive,
analyse and visualize biological data — is one of biology's fastest-growing technologies.
Biologists at the bench studying small networks of genes want user-friendly tools to
analyse their results and help them to plan experiments. They need accessible interfaces
that allow them to search databases, and compare their data with those of others (see
'Genome analysis at your fingertips').
At the other end of the spectrum, researchers analysing whole genomes, and drug-discovery
companies mining the genome for drug targets, want high-throughput analysis tools to
accelerate genome annotation and extract information from databases in more efficient and
sophisticated ways.
And all of those involved want more integration — integration of data across the hundreds,
if not thousands, of different databases, and visual integration of data to aid interpretation.
"The key to bioinformatics is integration, integration, integration," says bioinformatics
expert Jim Golden at Curagen spin-off 454 Corporation in Branford, Connecticut. "To
answer most interesting biological problems, you need to combine data from many data
sources," agrees Russ Altman, a biomedical informatics expert at Stanford University.
"However, creating seamless access to multiple data sources is extremely difficult."
Standard currencies
One of the most insidious problems is the lack of standard file formats and data-access
methods. But attempts to standardize them are gaining momentum. One success is the
distributed annotation system (DAS), a standard protocol developed by Lincoln Stein at
Cold Spring Harbor Laboratory in New York and his colleagues. "It's a simple solution to a
simple but obvious problem," says Stein. "There was no standard way of exchanging
sequence annotations."
DAS allows one computer to contact multiple servers to retrieve and integrate dispersed
genomic annotations associated with a particular sequence, such as predicted introns and
exons from one server and corresponding single-nucleotide polymorphisms (SNPs) from
another. It handles the annotations as elements associated with a particular stretch of
genomic sequence and so enables users to obtain a picture of that genome segment with all
of its associated annotations. Many providers of genome data, including WormBase,
FlyBase, the Ensembl server run by the European Bioinformatics Institute (EBI) and the
Sanger Institute near Cambridge, UK, and the genome browser at the University of
California, Santa Cruz, are currently running DAS servers.
Reckoning that data providers will never agree on a universal standard for representing
data, building database interfaces or writing access scripts, Stein thinks that web services
such as DAS are the best route to interoperability. Data providers only have to agree on a
small set of standards that define how their data and tools are presented to the outside
world.
And a 'registry' can keep track of which data sources implement which services. Scripts for
retrieving a particular type of data or operation consult the registry, as they would an
address book, to determine which data sources to query. A project of this type is
BioMOBY, led by Mark Wilkinson at the National Research Council in Saskatoon,
Canada. BioMOBY will be a powerful exploration tool, he says, because apart from
answering database queries, it will discover cross-references to other relevant data and
applications. Betting on BioMOBY's potential, several groups are encouraging its
development. "At the moment, we have the support of almost all of the model organism
databases," says Wilkinson.
Another indicator of the widespread desire for interoperability is the incorporation in
February 2002 of the Interoperable Informatics Infrastructure Consortium (I3C). With 14
member organizations — including Sun Microsystems of Santa Clara, California; IBM of
White Plains, New York; Millennium Pharmaceuticals and the Whitehead Institute for
Biomedical Research, both in Cambridge, Massachusetts — I3C is not a standards body,
but aims to develop and promote the adoption of common protocols.
To integrate the current set of non-standardized databases, researchers are relying on two
main strategies: warehousing and federation. A warehouse is a central database where data
from many different sources are brought together on one physical site. Entrez, the widely
used search-and-retrieval system developed by the US National Center for Biotechnology
Information in Bethesda, Maryland, is an example.
Access all areas
A popular tool is SRS produced by LION Bioscience
of Heidelberg, Germany, which facilitates access to a
wide range of biological databases using a warehouselike strategy. SRS is used in the online genome portals
maintained by Celera Genomics in Rockland,
Maryland, and Incyte Genomics in Palo Alto,
California, and is the core technology of tools sold by
LION.
LION BIOSCIENCE
Federation, on the other hand, links different
databases so that they appear to be unified to the enduser but are not physically integrated at a common
Structure prediction: modelling a
site. A query engine takes a complicated question
sequence homolog in LION's SRS
3D.
requiring access to multiple databases and divides it
into subqueries that are sent to the individual
databases. The answers are then reassembled and presented to the user. Aventis
Pharmaceuticals in Strasbourg, France, for example, has adopted IBM's DiscoveryLink
federating software to aid collaboration between its biologists and chemists in drug
development.
Which approach to use and when is much debated. "Updating and maintaining local copies
of external data collections in a warehouse is a major task," says bioinformatician Rolf
Apweiler at the EBI's lab in Hinxton, UK. Federation avoids this because the data are
accessed directly from the original source. But the bioinformatics databases you want to
query must be accessible for programmatic queries over the Internet, and most are not, says
Peter Karp, director of the bioinformatics research group at the non-profit research institute
SRI International in Menlo Park, California. "It's like installing a state-of-the-art telephone
exchange in a village without telephones."
Several projects combine the two approaches. On the industry side, IBM has set up a
partnership with LION to integrate DiscoveryLink with SRS. Particularly ambitious is the
public-domain Integr8 project led by Apweiler. His team aims to bring together some 25
major databases spanning a broad range of molecular data, from nucleotide sequences to
protein function. "We're trying to make an integrative layer on top of it all so that you can
easily zoom in on the sequence data linked to the gene, and then go to the genomic data, to
the transcriptional data and to the protein sequences. You'll have a sort of magnifying
glass," says Apweiler.
Knowledge is power
Smart systems that can answer complicated questions about different sorts of data are also
on the move. "A knowledge base is a fancy word for a database that allows you to do really
sophisticated queries," says bioinformatician Mark Yandell at the University of California,
Berkeley. Such databases often rely on vocabularies known as 'ontologies' (see 'Putting a
name on it') combined with frame-based systems, a way of representing data in computers
as objects within a hierarchy. One frame, for example, could be called 'protein', with slots
describing its relationships to other concepts, such as 'gene name', or 'post-translational
modifications'. So when a user asks a question about a protein, frames make it easy to
retrieve the name of the corresponding gene and the modifications the protein can undergo.
If the user asks for literature references, ontologies make it possible to retrieve not only
articles that include the protein name but also those about related genes or processes.
The Genome Knowledgebase, a collaborative project between Cold Spring Harbor
Laboratory, the EBI and the Gene Ontology Consortium, will have, among other
capabilities, the ability to make connections between disparate genomic data from different
species. "We store things specific to a species but allow a patchwork of evidence from
different species to weave together," says Ewan Birney, a bioinformatician at the EBI. So
when users pose questions about a biological process, they will get answers that incorporate
knowledge collected from various model organisms.
Knowledge bases are being developed for a wide variety of topics, but some researchers are
sceptical about their future. Information scientist Bruce Schatz of the University of Illinois
at Urbana-Champaign, for example, thinks that ontologies require too much expert effort to
generate and maintain. "All ontologies are eventually doomed," he says. Instead, he favours
a purely automated process of knowledge generation, such as concept-switching, which
relies on analysing the contextual relationships between phrases to identify underlying
concepts. Concept-switching algorithms, for example, allow users to start with a general
topic, such as mechanosensation, and explore its 'concept space', zeroing in on specific
terms such as the mechanosensory genes of a particular species.
Visualizing the genome
An essential component of bioinformatics is the ability to visualize retrieved data,
especially complex data, in ways that aid their interpretation. "Integration and visualization
are actually very closely related, because after you integrate information, the first thing you
want to do is display it," says Altman. "They're both parts of the issue of taking information
that's perfectly happy in a computer and turning it into information that a user is happy
digesting cognitively."
Genome browsers are particularly powerful, as they
provide a bounded framework, the genome sequence,
onto which many different types of data can be
mapped. The University of California, Santa Cruz, for
example, maintains a browser where users can
simultaneously view the locations of SNPs, predicted
genes and mRNA sequences along a chosen genome
stretch. "It's all about linking," says principal
investigator David Haussler. "It's about having it all at
your fingertips."
R.R. JONES
Tools that compare genomes from different species
are also proving their worth. The VISTA project,
developed and maintained by the Lawrence Berkeley David Haussler: putting the picture
together.
National Laboratory in Berkeley, California, allows
biologists to align and compare large stretches of
sequence from two or more species. "It gives you a graphical output where you see peaks of
conservation and valleys of lack of conservation," says Edward Rubin, one of VISTA's
developers.
Spotfire of Somerville, Massachusetts, sells software that can transform all sorts of data
into images. Using Spotfire's DecisionSite, researchers at Monsanto in St Louis, Missouri,
represented as a 'heat map' the results of complex experiments that tracked changes in the
expression of thousands of genes and the concentrations of numerous metabolites during
maize development. It helped them to link the expression of certain genes to the presence or
absence of particular amino acids. "A lot of times it's through comparisons and
comparisons and comparisons that researchers see an interesting trend," says David Butler,
vice-president of product strategy at Spotfire.
Biologists are moving closer to their dream of data
integration. But open issues remain. Schatz worries that if
public support doesn't increase, industry may come to
dominate the field, providing suboptimal solutions for
scientists. "If a Celera-like company starts doing this kind of
activity and they get bought by Microsoft, which is an
entirely possible activity in the world at large, then it will be
too late. And then scientists will get whatever the major
customers of Microsoft want," he says.
But Celera's director of scientific content and analysis,
Richard Mural, advocates a centralized, industry-based
solution to integration and genome annotation. He notes that
there are few rewards for academic researchers for working
on such problems, and their focused interests can be hard to
reconcile with a global approach. "To really get it done
quickly and well, I think the commercial may be a stronger
model," he says.
ROY KALTSCHMIDT/LBL
Edward Rubin takes a
graphical view.
However these issues are resolved, the road ahead looks bright. "Ninety-nine percent of
bioinformatics is new stuff," says Haussler. "It's an enormous frontier."
Distributed analysis system
http://biodas.org
Interoperable Informatics Infrastructure Consortium
http://www.i3c.org
University of California, Santa Cruz, genome browser
http://genome.ucsc.edu
Genome Knowledgebase
http://www.genomeknowledge.org
Entrez system
http://www.ncbi.nlm.nih.gov/Entrez
Ensembl genome browser
http://www.ensembl.org
VISTA
http://www-gsd.lbl.gov/vista
17 October 2002
Nature 419, 751 - 752 (2002); doi:10.1038/419751b
Genome analysis at your fingertips
MARINA CHICUREL
Marina Chicurel is a science writer based in Santa Cruz.
The working biologist now has an enormous number
of options when it comes to bioinformatics tools. On
one hand, there is a lot of free high-quality software
in the public domain. On the other, researchers can
buy commercial products offering added features,
such as programs to streamline sequential tasks, to
access proprietary databases and to enhance data
security. And because software producers realize
that users' needs change and their products will
rarely be used in isolation, flexibility and modularity
are on the rise.
INFORMAX
InforMax's BioAnnotator uses locally
An important trend has been the increasing
stored databases to find protein
motifs.
integration and sophistication of tools available to
non-experts. A wide range of user-friendly packages
incorporating tools for nucleotide and protein sequence analysis are available from
companies such as MiraiBio, a Hitachi Software Engineering subsidiary based in Alameda,
California; DNASTAR in Madison, Wisconsin; InforMax in Bethesda, Maryland; and
Accelrys in San Diego, California. On the non-commercial side, the Biology WorkBench
maintained by the Supercomputer Center at the University of California, San Diego, is
particularly popular, offering more than 80 bioinformatics tools to more than 10,000
registered users. "It's a one-stop-shop for doing a lot of things," says lead developer
Shankar Subramaniam. "You can be sitting in front of any type of computer; as long as you
have a web browser, you can access it."
Software has also become more user-friendly. Back in the early 1990s, users of the GCG
Wisconsin package, the grandfather of molecular-biology packages (now sold by Accelrys),
had to work with UNIX-based systems. Although these systems are still preferred by some,
users can now point-and-click their way through a wide range of tasks on ordinary desktop
computers.
Another trend is the increased integration of data analysis with experimental design. The
needs of bench scientists don't always coincide with those of professional bioinformaticians
producing tools for whole-genome analyses. Genome projects require programs that can
efficiently, if not very accurately, process huge amounts of sequence data, but the biologist
in the lab is often interested in studying small sets of genes and their products with very
high precision. Last month, for example, InforMax released GenomBench, a tool that
allows users to predict the structure of genes and their splice variants, progressively refine
these predictions, and then design experiments to validate them. "It's an interactive tool that
can work with researchers not just to analyse the data they have, but to design the right
experiment to resolve ambiguities in the data," says Steve Lincoln, senior vice-president of
life-science informatics at the company.
Others are hooking up their software to catalogues of reagents. As just one example, the
genome browser run by the University of California, Santa Cruz, is being used in a
collaboration with the National Cancer Institute in Bethesda, Maryland, to identify new
genes to expand, and ultimately complete, the Mammalian Gene Collection — a set of
cDNA clones of expressed genes for human and mouse. The browser will be linked to the
collection's website, so that users can go straight from analysing an electronic
representation of a gene to ordering a clone.
A key trend in the development of commercial products is the emergence of workflows,
automated chains of operations that can dramatically increase analysis throughput. For
example, software producer geneticXchange of Menlo Park, California, recently
demonstrated a workflow that sorts gene-expression data generated by microarrays, looks
up the accession numbers that identify the selected genes, collects sequence information
from the US National Center for Biotechnology Information's UniGene database, gathers
annotation information from the LocusLink website, and goes to Medline to assemble a list
of relevant references. "You just hit a button and it does what might take a biologist 600
hours to do, in about five hours," says Mark Haselup, chief technical officer for the
company.
Some commercial products are valuable because they're linked to otherwise unavailable
proprietary data. One of the main selling points of the Celera Discovery System, for
example, is the access it provides to the biotech firm's high-quality human and mouse
genome annotations. Unlike many other collections of annotations, a high proportion of
Celera's have been generated by manual curation (see 'Putting a name on it').
Commercial products often provide greater security for those who don't wish to manipulate
their unpublished or unpatented results openly over the Internet. Although some public sites
offer a degree of security, commercial packages usually have more protection options and
can be operated behind a firewall.
But the recurrent theme in the design of bioinformatics tools is the trend towards increased
integration. The Discovery Studio Gene package recently launched by Accelrys is a case in
point. "Results are put into a project database that has the ability to be accessed by a set of
applications that span both chemistry and biology," says Scott Kahn, senior vice-president
of life science at Accelrys. "We set up the ability to collaborate between domains."
Biology WorkBench
http://workbench.sdsc.edu
17 October 2002
Nature 419, 755 (2002); doi:10.1038/419755a
Putting a name on it
MARINA CHICUREL
Marina Chicurel is a science writer based in Santa Cruz.
A chasm separates sequence data from the biology of
organisms — and genome annotation will be the bridge,
says Lincoln Stein, a bioinformatics expert at Cold Spring
Harbor Laboratory in New York. Spanning three main
categories — nucleotide sequence, protein sequence and
biological process — annotation is the task of adding
layers of analysis and interpretation to the raw sequences.
The layers can be generated automatically by algorithms or
meticulously built up by experts in the hands-on process of
manual curation.
BILL GEDDES
Because manual curation is time-consuming and genome
projects are generating data, and even changing data, at an
extraordinary pace, there is a strong motive to shift as
much of the burden as possible to automated procedures. A
Lincoln Stein: bridging the
major task in the annotation of genomes, especially large
gap.
ones, is finding the genes. There are numerous geneprediction algorithms that combine statistical information
about gene features, such as splice sites, or compare stretches of genome sequence to
previously identified coding sequences, or combine both approaches. A new type of
algorithm, called a dual-genome predictor, uses data from two genomes, to locate genes by
identifying regions of high similarity.
Each algorithm has its strengths and limitations, working better with certain genes and
genomes than with others. The GENSCAN gene-predicting algorithm, developed by Chris
Burge at the Massachusetts Institute of Technology, has become a workhorse for vertebrate
annotation and was one of the algorithms used in the landmark publications of the draft
human genome sequence. FGENESH, produced by software firm Softberry of Mount
Kisco, New York, proved particularly useful for the Syngenta-led annotation of the rice
genome sequence.
Good data preparation is also important. "A lot of
the magic happens in the environment, not the
algorithm," says Ewan Birney a bioinformatician at
the European Bioinformatics Institute (EBI) in
Hinxton, near Cambridge, UK. "People often focus
on the whizzy technology to the detriment of the
real smarts, which happen in the sanitization of data
to present them to a hard-core algorithm." Data
sanitization includes steps such as masking
repetitive sequences, which can interfere with an
algorithm's performance.
HEIKKI LEHVASLAIHO
Automated annotation: Ewan Birney
All current large-scale efforts involve a combination and Ensembl.
of automatic and manual approaches. "For me it's
quite clear that they can only be complementary," says Rolf Apweiler at the EBI, who leads
annotation for the major protein databases SWISS-PROT and TrEMBL. "You can't
automate anything without having manual reference sets that you can rely on."
While Apweiler is tackling large-scale annotation, others are concentrating on finding
genes and proteins linked to a particular process, such as a disease. The bioinformatics and
drug-discovery company Inpharmatica in London, for example, provides annotation
databases and tools to identify potential drug targets.
Because of the plethora of different names given to the same genes and proteins in different
organisms, a growing trend is the use of 'ontologies' — controlled vocabularies in which
descriptive terms (such as gene and protein names) and the relationships between them are
consistently defined. One ontology that is now widely adopted is the Gene Ontology (GO),
but it doesn't cover all biology, and others have developed their own, often complementary,
ontologies. BioWisdom in Cambridge, UK, for example, sells information-retrieval and
analysis tools for drug discovery based on proprietary ontologies in fields such as oncology
and neuroscience.
Working as part of the Alliance for Cellular Signaling, a team led by Shankar Subramaniam
is developing an ontology that captures the different states of a protein, such as
phosphorylation state. This will serve as a foundation for the Molecule Pages, a literaturederived database of signalling molecules and their interactions.
GO coordinator Midori Harris at the EBI and her colleagues are encouraging developers of
new ontologies to make them publicly available through GO's website. They hope this will
not only drive standardization, but will help to expand GO's capabilities by allowing the
creation of combinatorial terms derived from different ontologies.
But most researchers agree that tools are only part of the solution. "The passion for biology
often gets missed out here," says Birney. "People think it is all about finding technical
solutions that magically solve problems, but frankly, far more important is really wanting to
see the data hang together."
Gene Ontology Consortium
http://www.geneontology.org
European Bioinformatics Institute
Alliance for Cellular Signaling
http://www.ebi.ac.uk
http://www.afcs.org
17 October 2002
Nature 419, 759 - 761 (2002); doi:10.1038/419759a
table of suppliers
Company
Products/activity
Location
Sequence, genome
and geneexpression analysis
Accelrys
GCG Wisconsin package San Diego,
for sequence and genome California
analysis; Discovery
Studio for database
mining, genomics and
proteomics
URL
http://www.accelrys.com
Affibody
Software for genomics
data analysis and
management
Bromma,
Sweden
http://www.affibody.com
Aneda
Desktop bioinformatics
tools for genomics and
proteomics
Roslin, UK
http://www.anedabio.com
Knoxville,
Tennessee
http://www.apocom.com
ApoCom Genomics Desktop bioinformatics
tools for gene prediction
and gene-expression
analysis
Array Genetics
Protein information
Newtown,
database; tools for
Connecticut
genomics and proteomics
http://www.arraygenetics.com
BIOBASE
TRANSFAC family of
Wolfenbüttel,
databases; analysis tools Germany
for gene expression,
promoters and signalling
pathways; contract
bioinformatics
http://www.biobase.de
Biocomputing
Data-management
systems for genotyping
and phenotype data
Espoo, Finland http://www.biocomputing.fi
Bioinformatics
Solutions
Desktop bioinformatics
tools for sequence
analysis and structure
prediction
Waterloo,
Canada
http://www.bioinformaticssolutions.com
BioTools
Analysis software for
gene and protein
sequences and
chromatograms
Edmonton,
Canada
http://www.biotools.com
Cognia
Bioinformatics software, New York,
including BIOBASE
New York
http://www.cognia.com
software and databases
Curagen
GeneScape portal for
genome analysis tools
Branford,
Connecticut
http://www.curagen.com
Digital Gene
Technologies
TOGA gene-expression
analysis software
La Jolla,
California
http://www.dgt.com
DNASTAR
Desktop sequenceanalysis and genome
visualization software
Madison,
Wisconsin
http://www.dnastar.com
Entigen
BioNavigator platform
Sunnyvale,
for sequence and genome California
analysis
http://www.entigen.com
GATC Biotech
Accelrys, DNASTAR and Constance,
other bioinformatics
Germany
software; DNA
sequencing
http://www.gatc-biotech.com
Gene Codes
Sequencher sequence
assembly and analysis
software
http://www.genecodes.com
Gene-IT
Universal software for
Le Chesnay,
database management and France
genomics
http://www.gene-it.com
Genomatix
Genome and sequence
analysis tools; portals to
mouse and human
genomes
Munich,
Germany
http://www.genomatix.de
Genomic Solutions
Proteomics
bioinformatics tools
Ann Arbor,
Michigan
http://www.genomicsolutions.com
Geospiza
Servers and tools for
sequence assembly and
analysis
Seattle,
Washington
http://www.geospiza.com
Hitachi Software
Engineering
DNASIS desktop
bioinformatics software
for DNA sequence
assembly and analysis,
and analysis of
microarray data
Yokohama,
Japan
http://www.hitachisk.co.jp/English/index.html
Inpharmatica
Biopendium and
CeleraEdition
Biopendium proteome
annotation resources;
PharmaCarta large-scale
discovery informatics
platform
London, UK
http://www.inpharmatica.com
InforMax
Vector bioinformatics
Bethesda,
software for sequence,
Maryland
genome and microarray
data; Vector NTI for
Macintosh; LabShare for
data storage and
management
http://www.informaxinc.com
Iobion Informatics
GeneTraffic microarray
http://www.iobion.com
Ann Arbor,
Michigan
La Jolla,
data-management and
analysis software
California
iSenseIt
Microarray data analysis
and storage software;
oligonucleotide
computation
Bremen,
Germany
http://www.isenseit.com
LabBook
eLabBook web-enabled McClean,
electronic notebooks;
Virginia
annotated human genome
database and data-mining
tools
http://www.labbook.com
LabVelocity
Jellyfish desktop
bioinformatics software;
information services
LION Bioscience
Bioinformatics software, Heidelberg,
database development
Germany
and management;
DiscoveryCenter platform
for data integration;
contract bioinformatics
http://www.lionbioscience.com
MiraiBio
DNASIS desktop
Alameda,
software for DNA
California
sequence assembly and
analysis, protein sequence
analysis, and analysis of
microarray data
http://www.miraibio.com
Molecular Biology
Insights
Oligonucleotide
identification software
Cascade,
Colorado
http://www.oligo.net
Paracel
Software for sequence
assembly, analysis and
sequence-based
genotyping
Pasadena,
California
http://www.paracel.com
Premier Biosoft
Desktop bioinformatics
packages for sequence
analysis, primer design,
and two-hybrid protein
interactions
Palo Alto,
California
http://www.premierbiosoft.com
PubGene
PubGene public access
and commercial gene
databases and analysis
software
Oslo, Norway http://www.pubgene.com
Redasoft
Genetic mapping and
sequence analysis
software and REBASE
restriction enzyme
database
Toronto,
Ontario
http://www.redasoft.com
Rosetta BioSoftware Rosetta Resolver geneexpression data analysis
system
Kirkland,
Washington
http://www.rii.com
Silicon Genetics
Redwood City, http://www.sigenetics.com
California
MetaMine, GeNet and
GeneSpring microarray
analysis software
San Francisco, http://www.labvelocity.com
California
science factory
BRENDA enzymology
database; überTOOL
bioinformatics platform
for sequence, expression
and structural data
Cologne,
Germany
http://www.science-factory.com
Softberry
Software for sequence
Mount Kisco,
and genome analysis and New York
database searching
http://www.softberry.com
Southwest Parallel
Software
Bioinformatics software
packages
Albuquerque,
New Mexico
http://www.spsoft.com
Textco
Desktop bioinformatics
packages and electronic
lab notebook
West Lebanon, http://www.textco.com
New
Hampshire
X-MINE
Bioinformatics platform
storage and analysis of
genomics data
Brisbane,
California
http://www.XMine.com
Chemical databases
San Leandro,
California
http://www.beilstein.com
Biomax Informatics Annotated human
genome database;
customized data
management
Martinsried,
Germany
http://www.biomax.de
BioWisdom
Text search and
pharmacology and
oncology information
databases
Cambridge,
UK
http://www.biowisdom.com
Celera Genomics
Web-based tools for
Rockville,
accessing the Celera
Maryland
annotated genomes
databases; bioinformatics
services
http://www.celera.com
Compugen
GenCarta annotated
human genome,
transcriptome and
proteome database
Tel-Aviv,
Israel
http://www.cgen.com
DECODON
Software for 2D-gel
analysis and information
storage
Greifswald,
Germany
http://www.decodon.de
GeneLogic
Gene-expression
databases and software
for drug discovery
Gaithersburg,
Maryland
http://www.GeneLogic.com
Iconix
DrugMatrix databases
Mountain
and software platform for View,
chemogenomics research California
http://www.iconixpharm.com
Incyte Genomics
Annotated gene and
Palo Alto,
expressed sequence tag
California
databases; Proteome
BioKnowledge Library
protein information
databases; bioinformatics
http://www.incyte.com
Databases
Beilstein
Information
software
Lexicon Genetics
Gene knockout and gene
function databases and
bioinformatics for drug
discovery
The
Woodlands,
Texas
http://www.lexgen.com
LifeSpan
BioSciences
Gene-expression and
protein-localization
databases and datamining tools
Seattle,
Washington
http://www.lsbio.com
MDL
Biological and chemical
information databases;
data-management
software
San Leandro,
California
http://www.mdli.com
Structural
Bioinformatics
Protein and proteinSan Diego,
structure databases;
California
computational proteomics
http://www.strubix.com
Amersham
Biosciences
Scierra Laboratory
Workflow System for
microarray and
sequencing data
Piscataway,
New Jersey
http://www.amershambiosciences.com
CLONDIAG
PARTISAN microarray
LIMS
Jena, Germany http://www.clondiag.com
geneticXchange
K1 System middleware
platform for biological
data integration
Menlo Park,
Callifornia
HeliXense
Software and system
Singapore
infrastructure supporting
large-scale distributed
computing and biological
data management
http://www.helixense.com
IBM
DiscoveryLink platform White Plains,
for database integration; New York
data-management systems
http://www.ibm.com/solutions/lifesciences
Mitsui Knowledge
Industry
LIMS; software for
Tokyo, Japan
membrane protein
secondary-structure
prediction, data
management and analysis
of gene-expression and
SNP data
http://bio.mki.co.jp
NEC
Computer systems and
networks
Tokyo, Japan
http://www.nec-global.com
Protedyne
LIMS middleware for
integration of networkenabled laboratory
software
Martinsried,
Germany
http://www.protedyne.com
Computer systems,
middleware and
laboratory
information
management
systems (LIMS)
http://www.geneticxchange.com
Silicon Graphics
SGI servers for highthroughput computing,
visualization and data
management
San Francisco, http://www.sgi.com
California
Sun Microsystems
Servers and workstations Santa Clara,
for high-throughput
California
computing; universal
software platforms for
networks
http://www.sun.com
TimeLogic
DeCypher system for
accelerated
bioinformatics
Crystal Bay,
Nevada
http://www.timelogic.com
TurboWorx
Open computational
platforms for biological
research data including
bioinformatics
New Haven,
Connecticut
http://www.turbogenomics.com
Services
Aber Genomic
Computing
Design of data-mining
Aberystwyth,
and predictive modelling UK
software
http://www.abergc.com
AGOWA
Genome and expressed
sequence tag analysis;
automated sequence
annotation customized
bioinformatics services
Berlin,
Germany
http://www.agowa.de
BioInformatics
Services
Computational biology;
bioinformatics services
Rockville,
Maryland
http://www.bioinformaticsservices.com
Chemical
Computing Group
Bioinformatics software, Montreal,
services and computerQeubec,
aided molecular design
Canada
http://www.chemcomp.com
Cyberell
Bioinformatics software
and services
Helsinki,
Finland
http://www.cyberell.com
ePitope Informatics Epitope prediction over
the web
Durham, UK
http://www.epitope-informatics.com
GeneData
Bioinformatics systems
and services; database
development and
management
Basel,
Switzerland
http://www.genedata.com
Genometrix
Genotyping, gene
expression and
bioinformatics services
The
Woodlands,
Texas
http://www.genometrix.com
Keygene
DNA fingerprint analysis Wageningen,
software; contract
The
genomics and
Netherlands
bioinformatics services
NuGenesis
Scientific data
management services
Westborough, http://www.nugenesis.com
Massachusetts
Sagitus Solutions
Bioinformatics software
development
Manchester,
UK
http://www.sagitussolutions.co.uk
SRI International
Contract informatics
services
Menlo Park,
California
http://www.sri.com
http://www.keygene.com
Tripos
General
ALMA
Bioinformatica
Chemical libraries;
molecular modelling,
pharmacophore
perception and virtual
screening software;
contract informatics
St Louis,
Missouri
http://www.tripos.com
Bioinformatics software, Madrid, Spain http://www.almabioinfo.com
consultancy and training
Applied Maths
Gel fingerprint analysis
and bioinformatics
software; contract
bioinformatics
Kortrijk,
Belgium
http://www.applied-maths.com
Bio-Rad
WorksBase
bioinformatics software
for proteomics
Hercules,
California
http://www.discover.bio-rad.com
BioSolveIt
Software for molecular
St Augustin,
modelling, smallGermany
molecule docking, protein
threading; bioinformatics
services and training
Dalicon
Bioinformatics software Nijmegen, The http://www.dalicon.com
for large-scale data
Netherlands
management and analysis
MegaMetrics
Data-mining software for Wyndmoor,
microarray, proteomics
Pennsylvania
and SNP databases
http://www.megametrics.com
Molecular Mining
Data-mining software
Kingston,
Ontario,
Canada
http://www.molecularmining.com
Partek
Pattern recognition and
interactive visualization
software; consulting
services
St Charles,
Missouri
http://www.partek.com
Spotfire
DecisionSite analytical
and statistical datamanagement software
Somerville,
http://www.spotfire.com
Massachusetts
SPSS
Clementine statistical and Chicago,
data-mining software;
Illinois
Clementine microarray
application template
http://www.spss.com
Zeptosens
SensiChip microarray
systems
http://www.zeptosens.com
Witterswil,
Switzerland
http://www.biosolveit.de