Download Grand challenges in bioinformatics.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

Genetic code wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Silencer (genetics) wikipedia , lookup

History of molecular evolution wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene expression wikipedia , lookup

Biochemistry wikipedia , lookup

Western blot wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genome evolution wikipedia , lookup

Protein folding wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein adsorption wikipedia , lookup

Proteolysis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Homology modeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Molecular evolution wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
BIOINFORMATICS
Editorial
Grand challenges in bioinformatics
The protein folding problem has been one of the grand
challenges in computational molecular biology. The problem
is to predict the native three-dimensional structure of a protein
from its amino acid sequence. It is widely believed that the
amino acid sequence contains all the necessary information to
make up the correct three-dimensional structure, since the
protein folding is apparently thermodynamically determined;
namely, given a proper environment, a protein would fold up
spontaneously. This is called Anfinsen’s thermodynamic
principle.
While this principle is well established in selected proteins
under in vitro experimental conditions, protein folding in vivo
is a more complex and dynamic process involving a number
of other molecules such as chaperones. The environment has
to be considered as a collection of various interactions with
molecules rather than a smooth thermodynamic environment.
It is not unreasonable to expect that the protein folding
problem cannot be solved for the majority of proteins in nature
without considering specific molecular interactions. This is
reminiscent of the problem of secondary structure prediction
in proteins. However good the algorithms developed for
secondary structure prediction are, the success rate will be
limited as long as only the short-range interactions are
considered. Similarly, however good the algorithms
developed for the three-dimensional structure prediction are,
the success rate will be limited as long as only the information
of a single molecule is examined.
In the era of whole-genome sequencing, we are faced with
another grand challenge problem, which may be called the
organism reconstruction problem. Given a complete genome
sequence, the problem is to predict computationally the
development of the adult from a single cell and its continual
function as a biological organism. Here again, a traditional
view is that the genome is a blueprint of life containing all the
necessary information that would make up an organism. A
clone can be made by replacing the nucleus, which is the
Oxford University Press
localized area containing all genetic information. Thus, this
might be called Dolly’s cloning principle.
According to this genetic determinism principle, we should
eventually be able to predict the function of every gene in the
genome by its sequence information alone. Implicitly, this
assumes that the environment of each gene is also computable
from the complete genome sequence because the function of
a molecule can only become meaningful in relation to its
environment. Therefore, the entire molecular architectures
and molecular reaction pathways in a germ cell, for example,
may be computable from the genomic sequence. We thus end
up asserting that the form and function of an organism are
represented in the nucleus.
In an alternative view, the genome is simply a warehouse of
parts, or building blocks of life, and a real blueprint of life is
written in the entire cell, perhaps as a network of molecular
interactions. Whichever view one takes, it is impossible in
practice to make sense fully out of the sequence data without
additional information, including time and localization of
expression and, especially, the information on molecular
interactions. In fact, in order to obtain any functional clue of
hypothetical proteins that still form one-third to one-half of the
genes in every genome that has been sequenced, new
systematic experiments are being designed to observe, for
example, gene–gene interactions by disruption experiments
and protein–protein interactions by yeast two-hybrid system
experiments.
Bioinformatics has emerged as a major discipline due to the
rapid increase in sequence information, developing new
databases and computational technologies that help us to
understand the biological meaning encoded in the sequence
data. In a post-genomic era of systematic functional analysis,
the basis of bioinformatics is not only the complete catalogue
of building blocks, but also the complete catalogue of their
interactions. With this new level of information, the grand
challenge problems in bioinformatics, both old and new, and
both structural and functional, may one day be elucidated,
although not in the manner in which they were originally
formulated.
Minoru Kanehisa
309