Download Computational Structural Genomics of a Complete Minimal Organism

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Proteasome wikipedia , lookup

Biochemistry wikipedia , lookup

History of molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Silencer (genetics) wikipedia , lookup

SR protein wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Magnesium transporter wikipedia , lookup

Gene expression wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein folding wikipedia , lookup

Molecular evolution wikipedia , lookup

Protein wikipedia , lookup

Circular dichroism wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Cyclol wikipedia , lookup

Interactome wikipedia , lookup

Protein moonlighting wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein domain wikipedia , lookup

Western blot wikipedia , lookup

Protein adsorption wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Structural alignment wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
390
Genome Informatics 13: 390–391 (2002)
Computational Structural Genomics of
a Complete Minimal Organism
John-Marc Chandonia1
David E. Konerding1
Darryl G. Allen1
[email protected]
[email protected]
[email protected]
In-Geol
Choi1
[email protected]
1
2
Hisao
Yokota1
[email protected]
Steven E. Brenner1,2
[email protected]
Berkeley Structural Genomics Center, Ernest Orlando Lawrence Berkeley National
Laboratory, Berkeley, CA 94720, USA
Department of Plant and Microbial Biology, 111 Koshland Hall, University of California, Berkeley, CA 94720-3102, USA
Keywords: structural genomics, functional prediction from structure
1
Introduction
Structural genomics aims to provide an experimental structure or computational model of every
tractable protein in a complete genome. A considerable fraction of the genes in all sequenced genomes
have no known function, and have diverged sufficiently from functionally characterized homologues
that the evolutionary relationship cannot be detected from sequence alone. Determining the structure of these proteins may reveal distant homology, which can be used to infer cellular and molecular
functions. The structure is also important for acquiring a detailed understanding of enzymatic catalysis and interaction with small molecule ligands and other proteins. More generally, knowledge of
an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and
pathways.
Mycoplasma genitalium has the smallest bacterial genome, with only 480 proteins. Its close relative,
Mycoplasma pneumoniae, has 677. The small size of these genomes should allow us to obtain structures
or models for all tractable proteins in these genomes within 5 years. This is expected to yield important
insight into the minimal set of genes necessary for life; many genes in more complex organisms may
be variations on genes in the minimal set.
2
2.1
Methods and Results
Project Overview
An overview of the project is shown in Figure 1. We use computational methods to identify and model
proteins whose structures can be predicted. Other proteins and their homologues (especially those
from hyperthermophiles) will be screened computationally to find ones which are likely to be amenable
to experimental characterization. Solved structures are analyzed using a variety of computational
methods in order to provide experimentally testable hypotheses about their cellular functions and
potential binding partners. Current results are available on the Berkeley Structural Genomics Center
website [6].
2.2
Function from Structure
Even when proteins have diverged too far to be recognizably similar by sequence comparison, their
folds are conserved. Recognition of homology from structure allows inference of likely functional
relatedness, which may be then precisely evaluated. We are assessing and developing several novel
Computational Structural Genomics of a Complete Minimal Organism
391
techniques of analyzing protein structure. When these are determined to be robust, we will apply
those methods to the structures determined as part of this project. One method is direct comparisons
with functionally characterized homologues, using a structural alignment tool such as MINAREA [3].
Another method is analysis of the surface properties of the protein, possibly coupled with an analysis
of conservation among homologues of a protein from multiple species. A more thorough means of
using information from multiple homologues is the Evolutionary Trace method [4]. A dendrogram of
related sequences is constructed using multiple homologues of a gene from various organisms. Each
residue in the protein can then be traced in the tree, providing an evolutionary perspective in which
to evaluate the structural or functional role of that residue. Finally, ligands revealed in the crystal
structures themselves can provide clues as to the function of the protein. In the crystal structure of
MJ0577, a “hypothetical protein” from Methanoccocus jannaschii, a bound ATP was observed. Prior
to crystallization, there was no biochemical evidence that the protein bound ATP. This led to the
hypothesis that the protein functions as a molecular switch in combination with other proteins [5].
A
B
Representation of the proteins in a single
genome. The proteins are illustrated as
points in some arbitrary sequence space.
Stars indicate proteins of known structure.
Proteins whose structures are not tractable
are eliminated.
Multiple genomes. We work in the context
of all fully-sequenced genomes. Colors
indicate different genomes’ proteins.
E
D
Structure-identified families. Often
structural similarity will reveal homology,
even when the families lack significant
sequence similarity.
C
Target
Target selection. A family is selected for
experimental structural characterization, and
a target from within that family is
highlighted.
Sequence-identified families. By sequence
similarity, it is possible to recognize
homology among the proteins and construct
families.
F
Analysis. The solved structure is analyzed
and structural similarity identified. The
structure is also used to make models of
homologs.
Figure 1: Project overview; more details are provided in [1, 2].
Figure 1: Project overview; more details are provided in [1,2].
References
of the
surface
properties
of thegenomics,
protein, possibly
an analysis
of conservation
among
[1]analysis
Brenner,
S.E.,
A tour
of structural
Nature coupled
Reviewswith
Genetics,
2:801–809,
2001.
homologues
of
a
protein
from
multiple
species.
A
more
thorough
means
of
using
information
from
multiple
[2] Brenner, S.E., Target selection for structural genomics, Nature Structural Biology, Structural
homologues
the Evolutionary
Trace method
GenomicsisSupplement,
7:967–969,
2000. [4]. A dendrogram of related sequences is constructed using
multiple homologues of a gene from various organisms. Each residue in the protein can then be traced in
[3]theFalicov,
A. and an
Cohen
F.E., A perspective
surface of minimum
area
metricthe
forstructural
the structural
comparison
tree, providing
evolutionary
in which to
evaluate
or functional
role ofofthat
proteins,
J.
Mol.
Biol.,
258:871–892,
1996.
residue. Finally, ligands revealed in the crystal structures themselves can provide clues as to the function of
[4]theLichtarge,
O.,theBourne,
H.R., and
Cohen, F.E.,
An evolutionary
method definesjannaschii,
binding a
protein. In
crystal structure
of MJ0577,
a “hypothetical
protein” trace
from Methanoccocus
surfaces
to protein
families,
J. Mol. Biol.,
257:342–358,
1996.
bound
ATPcommon
was observed.
Prior
to crystallization,
there
was no biochemical
evidence that the protein
ATP. This
led Hung,
to the hypothesis
that the protein functions
as a molecular
switchH.,
in combination
with
[5]bound
Zarembinski,
T.I.,
L.W., Mueller-Dieckmann,
H.J., Kim,
K.K., Yokota,
Kim, R., and
other
proteins
Kim,
S.H., [5].
Structure-based assignment of the biochemical function of a hypothetical protein: A
test case of structural genomics, Proc. Natl. Acad. Sci. USA, 95:15189–15193, 1998.
[6]References
http://www.strgen.org/ – Berkeley Structural Genomics Center website.
[1] Brenner SE. A Tour of Structural Genomics. Nature Reviews Genetics 2:801-9. 2001.
[2] Brenner SE. Target Selection for Structural Genomics. Nature Structural Biology, Structural Genomics
Supplement 7: 967-9. 2000.
[3] Falicov A, Cohen FE. A Surface of Minimum Area Metric for the Structural Comparison of Proteins.