Download Computational Biology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ancestral sequence reconstruction wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Lac operon wikipedia , lookup

LSm wikipedia , lookup

Proteasome wikipedia , lookup

Transcriptional regulation wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Gene expression wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Magnesium transporter wikipedia , lookup

Molecular evolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Expression vector wikipedia , lookup

Genome evolution wikipedia , lookup

SR protein wikipedia , lookup

Protein wikipedia , lookup

Gene regulatory network wikipedia , lookup

List of types of proteins wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Protein adsorption wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Western blot wikipedia , lookup

Cyclol wikipedia , lookup

Protein moonlighting wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Computational functional genomics
The goal of computational functional genomics is to assign the function,
localization, and interactions of genes (proteins) from the genome organisation,
homology to other proteins, occurrence in different species ...
- phylogenetic profiles
- assignment of function and localization
- combination with operon method, rosetta stone method, genome
neighborhood method
10. Lecture WS 2003/04
Bioinformatics III
1
Assigning protein functions by comparative genome
analysis: protein phylogenetic profiles
Hypothesis: functionally linked proteins evolve in a correlated fashion, and,
therefore, they have homologs in the same subset of organisms.
In general, pairs of functionally linked proteins have no amino acid sequence
similarity with each other and, therefore, cannot be linked by conventional sequencealignment techniques.
Phylogenetic profile of a particular protein:
a string with n entries, each one bit, where n corresponds to the number of
genomes. The presence of a homolog to a given protein in the nth genome is
indicated by an entry of 1 at the nth position. If no homolog is found, the entry is 0.
Variation: assign 1/E-value from BLAST to distinguish levels of similarity.
Pellegrini et al. PNAS 96, 4285 (1999)
10. Lecture WS 2003/04
Bioinformatics III
2
Protein phylogenetic profiles
Illustrate method for hypothetical case of four
fully sequenced genomes (from E. coli,
Saccharomyces cerevisiae, Haemophilus
influenzae, and Bacillus subtilis) in which we
focus on seven proteins (P1-P7).
For each E. coli protein, a profile is
constructed, indicating which genomes code
for homologs of the protein. We next cluster
the profiles to determine which proteins share
the same profiles. Proteins with identical (or
similar) profiles are boxed to indicate that
they are likely to be functionally linked.
Boxes connected by lines have phylogenetic
profiles that differ by one bit and are termed
neighbors.
Pellegrini et al. PNAS 96, 4285 (1999)
10. Lecture WS 2003/04
Bioinformatics III
3
Test the method for known case
To test whether proteins with similar phylogenetic profiles are functionally linked,
examine the phylogenetic profiles for two proteins that are known to participate in
structural complexes: the ribosomal protein RL7 and the flagellar structural protein
FlgL,
as well as a protein known to participate in a metabolic pathway, the histidine
biosynthetic protein HIS5.
Identify all other E.coli ORFs with phylogenetic profiles identical to those 3
proteins, and the ORFs that differ by one bit.
Pellegrini et al. PNAS 96, 4285 (1999)
10. Lecture WS 2003/04
Bioinformatics III
4
3 phylogenetic profiles for E.coli proteins
Proteins with phylogenetic profiles in the neighborhood of
ribosomal protein RL7 (A), flagellar structural protein FlgL
(B), and histidine biosynthetic protein His5 (C). All
proteins with profiles identical to the query proteins are
shown in the double boxes. All the proteins
with profiles that differed by one bit are
shown in the single boxes. Proteins in
bold participate in the same complex or pathway
as the query protein. Proteins in italics participate in a
different but related complex or pathway. Proteins with
identical profiles are shown within the same box. Single
lines between boxes represent a one-bit difference
between the two profiles. Homologous proteins are
connected by a dashed line or are indented. Each protein
is labeled by a four-digit E. coli gene number, a SwissProt
gene name, and a brief description. Note that proteins
within a box or in boxes connected by a line have similar
functions. Proteins in the double boxes in A, B, and C
have 11, 6, and 10 ones, respectively, in their
phylogenetic profiles, of a possible 16 for the 17 genomes
presently sequenced.
10. Lecture WS 2003/04
Pellegrini et al. PNAS 96, 4285 (1999)
Bioinformatics III
5
results from phylogenetic profile analysis
The phylogenetic profile of a protein describes the presence or absence of
homologs in organisms.
Proteins that make up multimeric structural complexes are likely to have similar
profiles.
Also, proteins that are known to participate in a given biochemical pathway are
likely to be neighbors in the space of phylogenetic profiles.
Proteins that are functionally linked are far more likely to be neighbors in profile
space than randomly selected proteins. However, only a fraction of all possible
neighbors is found with a group.
Therefore, not all functionally linked proteins have similar profiles.
They may fall into multiple clusters in profile space.
Interestingly, hypothetical are also more likely to be neighbors than random
proteins, suggesting that many hypothetical proteins are part of uncharacterized
pathways or complexes.
Pellegrini et al. PNAS 96, 4285 (1999)
10. Lecture WS 2003/04
Bioinformatics III
6
Localizing proteins in the cell
from their phylogenetic profiles
Observation: proteins localized to a given organelle by experiments tend to share
a characteristic phylogenetic distribution of their homologs – the phylogenetic
profile.
Marcotte et al. PNAS 97, 12115 (2000)
10. Lecture WS 2003/04
Bioinformatics III
7
Phylogenetic profile of yeast proteins
(A) The mean phylogenetic profiles (horizontal bars of 31
elements) of yeast proteins experimentally localized to
different cellular locations. Each profile shows the distribution
among genomes of homologs of proteins from one subcellular
location.
Plasma Mb, plasma membrane. Colors express the average
degree of sequence similarity of proteins in that organelle to
their sequence homologs in the indicated genomes, with red
indicating greater average similarity and blue indicating less.
(B) A tree of the observed relationships among the yeast
proteins from different subcellular compartments. Overlaid on
the tree is our interpretation of the relationships, showing
ellipses clustering compartments thought to be derived from
the progenitor of mitochondria (orange ellipse) and of the
eukaryote nucleus (yellow ellipse). A distance matrix was
calculated of pairwise Euclidian distances between the mean
phylogenetic profiles (A) of proteins known to be localized in
each compartment. A tree was generated from this matrix by
the neighbor-joining method implemented in PHYLIP 3.5C.
Marcotte et al. PNAS 97, 12115 (2000)
10. Lecture WS 2003/04
Bioinformatics III
8
Classification scheme
The scheme by which proteins are
classified into mitochondrial or
nonmitochondrial cellular
localizations. Each horizontal bar is
a phylogenetic profile; that for the
protein of interest x0 is compared
with the mean profiles for
mitochondrial and nonmitochondrial
proteins to determine its
localization.
In this example, the protein of
interest is assigned to the
mitochondrion because the query
protein's phylogenetic profile more
closely resembles the mean profile
of mitochondrial proteins than the
mean profile of cytosolic proteins.
Marcotte et al. PNAS 97, 12115 (2000)
10. Lecture WS 2003/04
Bioinformatics III
9
Assignment of nuclear genome-encoded proteins to Mitochondria
Assignment of
nuclear genomeencoded proteins
to mitochondria.
(Left) For yeast,
a jackknife test
on experimentally
localized yeast
proteins showing
the method coverage
(fraction of mitochondrial proteins correctly
assigned) plotted versus
the method accuracy
(fraction of proteins
assigned to mitochondria
known to be mitochondrial).
(Inset) The (noncumulative)
number of known (gray curve) and newly predicted (black
curve) mitochondrial proteins for each coverage level, along
with the number of known false positive predictions (white
curve). One hundred jackknife trials were performed, randomly
removing 10% of the proteins for each trial.
(Right) Predicted localization of experimentally
localized worm proteins by using yeast proteins as
the training set.
Marcotte et al.
PNAS 97, 12115 (2000)
10. Lecture WS 2003/04
Bioinformatics III
10
Functions of mitochondrial proteins
Functions of yeast mitochondrial proteins are
plotted for known mitochondrial proteins (upper
three pie charts) and for the newly predicted
mitochondrial proteins (lower pie chart). Each pie
chart shows the percentage of proteins with a given
function. Known mitochondrial proteins can be
operationally divided into three populations: those
with homologs in eubacteria or archaea
(prokaryote-derived mitochondrial proteins), those
with homologs only in other eukaryotes (eukaryotederived mitochondrial proteins), and those without
detectable homologs in the set of complete
genomes (organism-specific mitochondrial
proteins). Many functional systems, such as the
mitochondrial ribosome, have components from
more than one category of genes. The organismspecific mitochondrial proteins may be conserved
in related species; many of the yeast-specific
genes are conserved in other fungi as well,
although absent in the more distantly related
eukaryotes listed in Fig. 1A. Functional categories
are defined as in the MIPS (Munich Information
Center for Protein Sequences) database (29). For
this analysis, mitochondrial proteins were predicted
with an accuracy of 70% as scored by the selfconsistency test.
10. Lecture WS 2003/04
Marcotte et al. PNAS 97, 12115 (2000)
Bioinformatics III
11
Inference of protein function and protein lineages in
Mycobacterium tuberculosis based on prokaryotic
genome organization
One difference between prokaryotic and eukaryotic genomes is the
organization of the prokaryotic genome into multi-gene units, known as
operons.
Prokaryotic operon organization enables the highly controlled co-expression of
multiple genes, by transcribing them together onto a single transcript.
The encoded proteins of common operons often have related functions, form
common complexes, or participate in shared biochemical pathways.
Strong, Mallick, Pellegrini, Thompson, Eisenberg
Genome Biology 2003 4:R59
10. Lecture WS 2003/04
Bioinformatics III
12
Prokaryotic operon organization
(a) Prokaryotic operon organization. Genes A,
B, and C are transcribed together onto a
single polycistronic transcript, which is then
translated to produce three separate proteins.
Proteins originating from genes of a common
operon often have similar functions, interact
physically through protein-protein
interactions, or participate in shared
biochemical pathways.
(b) Functional Linkages based on the Operon
method. Genes A, B and C are 'linked' if the
intergenic nucleotide distance between pairs
of adjacent genes is less than or equal to the
specified threshold. In this case the distance
between gene A and B, and the distance
between gene B and C is less than the
hypothetical distance threshold, thereby
allowing links between all possible sets of
genes.
10. Lecture WS 2003/04
Strong, Mallick, Pellegrini, Thompson, Eisenberg
Genome Biology 2003 4:R59
Bioinformatics III
13
Prokaryotic operon organization
Although the operon structure has been well studied at the biochemical level in
microorganisms such as E.coli , genome-wide operon organization in pathogenic
organisms, such as M. tuberculosis, remains largely unknown.
One can exploit the conservation of certain genetic elements present in many
prokaryotic organisms, including M. tuberculosis, to learn about operon structure
and gene function:
-10 and -35 bp promoter elements
- ribosome binding sites (RBS)
- the 5‘ and 3‘ untranslated regions (UTR)
Strong, Mallick, Pellegrini, Thompson, Eisenberg
Genome Biology 2003 4:R59
10. Lecture WS 2003/04
Bioinformatics III
14
Independent vs. consecutive transcription
Schematic representation of the
minimum genetic requirements for
adjacent genes that are transcribed
independently and those transcribed
together as a single operon.
Cases 1, 2 and 3 depict instances
where gene A and gene B are
transcribed independently as
distinct transcriptional units, while
Case 4 depicts genes organized into
a common operon.
The minimum requirement for genes
of a common operon is only a RBS,
while Case 3 emphasizes the
numerous genetic elements required
if gene A and gene B are organized
into separate transcription units
Strong, Mallick, Pellegrini, Thompson, Eisenberg
Genome Biology 2003 4:R59
10. Lecture WS 2003/04
Bioinformatics III
15
gene linkage based on Operon method
Strong et al.,Genome Biology (2003) 4:R59
10. Lecture WS 2003/04
Bioinformatics III
16
Conservation of Swissprot annotation
Swissprot-keyword recovery scores as a
function of combined intergenic
distances between pairs of genes in a
run. All gene members of a run
(bordered on each side by genes in
opposite orientations) were linked and
given a value equal to the combined
intergenic distances between them.
While the keyword recovery of genes
linked by a combined intergenic
distance less than 150 bp is fairly high
(34-52%), it is apparent that as the total
intergenic distance increases above 150
bp, there is a decrease in keyword
recovery. At combined intergenic
distances above 250 bp the keyword
recovery is comparable to that of
randomly linked genes.
10. Lecture WS 2003/04
Strong et al.,Genome Biology (2003) 4:R59
Bioinformatics III
17
Combine computational methods of functional assignment
4 methods for functional assignment used:
Operon method (intergenic distance criterion)
Rosetta Stone (RS): genes A and B have
common function if a fused gene AB is found in
any other organism
Phlogenetic Profile (PP)
Conserved Gene Neighbor (GN) method:
identify genes that are in close proximity in
multiple genomes
Keyword recovery scores for the Operon method
alone and in combination with RS, PP, and GN
methods. Notice that the combination of either
RS, PP, or GN has a dramatic effect on the
keyword recovery, with the best score resulting
from a combination of the 100 bp Operon, RS
and PP methods.
Strong et al.,Genome Biology (2003) 4:R59
10. Lecture WS 2003/04
Bioinformatics III
18
Distance profiles of adjacent genes
Distance profile of adjacent M. tuberculosis genes in the same orientation that are
functionally linked by the Rosetta Stone, Phylogenetic Profiles or conserved Gene Neighbor
methods, compared to adjacent genes in the same orientation that are not linked by these
methods.
Strong et al.,Genome Biology (2003) 4:R59
10. Lecture WS 2003/04
Bioinformatics III
19
Gene finding
(c) Distance profile of adjacent genes
in the same orientation in
experimentally documented operons
in E. coli. E. coli operon data obtained
from RegulonDB.
The linked profile (a) yielded a mean
intergenic distance of 27 base pairs,
as compared with (b) 94 base pairs for
the mean intergenic distance for
genes not linked by any of the three
methods. This demonstrates that
adjacent genes in the same
orientation that have small intergenic
spacing are more likely to be
functionally linked that those that are
separated farther apart.
Strong et al.,Genome Biology (2003) 4:R59
10. Lecture WS 2003/04
Bioinformatics III
20
Determine operon distance threshold
Keyword recovery and
maximum false positive fraction
scores as the Operon distance
threshold increases from 0 bp to
300 bp. Notice the decrease in
the keyword recovery and the
increase in maximum false
positive fraction as the distance
threshold increases.
Strong et al.,Genome Biology (2003) 4:R59
10. Lecture WS 2003/04
Bioinformatics III
21
Verify predictions on known examples
Comparison of the genomic
organization of the leucine
biosynthesis genes in M.
tuberculosis and S. pombe.
(a) Genomic organization of
the leuC and leuD genes of
M. tuberculosis. (b) S. pombe
alpha-isopropylmalate
isomerase, containing both
the leuC and leuD coding
regions in a single fusion
gene.
This example illustrates the
power of the Rosetta Stone,
Phylogenetic Profile, Gene
Neighbor and Operon
methods to infer a functional
linkage, in this case one that
is already established.
10. Lecture WS 2003/04
Strong et al.,Genome Biology (2003) 4:R59
Bioinformatics III
22
Inference of protein function
Inference of M. tuberculosis
protein function and operon
organization based on multiple
method overlap.
(a) Inference of an operon
encoding members involved in
thiamine biosynthesis.
(b) Operon inference for a
region possibly involved in RNA
degradation.
(c) Functional links and operon
inference for a region likely to
be involved in cell wall
metabolism. In these cases,
inferences are made for the
functions of uncharacterized
genes by their functional
linkages to genes of known
function.
Strong et al.,Genome Biology (2003) 4:R59
10. Lecture WS 2003/04
Bioinformatics III
23
Identification of novel genes
Identification of two novel genes linked to the arabinogalactan biosynthesis pathway, an
important target of M. tuberculosis specific drugs. Based on the close proximity of adjacent
genes (Operon method) and the functional linkage established by the Rosetta Stone
method, the authors infer that Rv1503c and Rv1504c may be organized into a common
operon. Both genes also have functional links to the genes rfe and rmlB, important
components in the arabinogalactan biosynthesis pathway.
Strong et al.,Genome Biology (2003) 4:R59
10. Lecture WS 2003/04
Bioinformatics III
24
Assignment of possible function
A unique M. tuberculosis
gene linked to a
glutamine synthetase
paralog. Few homologs of
Rv1879 exist in
prokaryotes, but some
plants and certain fungi
contain a fusion protein
containing domains
homologous to both
Rv1879 and to glutamine
synthetase. The Operon
and Rosetta Stone
linkages suggest a
possible role for Rv1879,
and a possible functional
association with the glnA3
gene product.
10. Lecture WS 2003/04
Strong et al.,Genome Biology (2003) 4:R59
Bioinformatics III
25
Discovery of uncharacterized cellular systems
by genome-wide analysis of functional linkages
Find computational approaches for finding gene and protein interactions to
complement and extend experimental approaches such as:
- synthetic lethal and suppressor screens
- yeast two-hybrid experiments
- high-throughput mass spectrometry interaction assays.
Approach followed here: phylogenetic profiles
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
10. Lecture WS 2003/04
Bioinformatics III
26
Identify novel cellular sytems
Top: Using computational genetics, the
genome-wide protein network of an
organism is reconstructed.
Middle Suitable candidate clusters that
contain three or more linked proteins, at
least 50% of which are uncharacterized, are
selected for further evaluation.
Bottom: Such core clusters are then
extended to include operon partners and
other proteins that are naturally linked with
the protein cluster.
Thick boxes and lines indicate proteins in the core
cluster; thin boxes and lines indicate proteins extending
the core cluster. Shaded boxes represent homologs; thick
gray lines represent links to operon partners.
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
10. Lecture WS 2003/04
Bioinformatics III
27
Metric of phylogenetic profile similarity
The mutual information MI(A,B) measures the similarity of a pair of phylogenetic
profiles A and B.
MI(A,B) is maximum when there is complete covariance between the occurrences
of the genes A and B and tends to 0 as variation decreases or the gene
occurrences vary independently.
M(A,B) = H(A) + H(B) – H(A,B)
H  A   pa ln pa 
H(A) represents the marginal entropy of the probability distribution p(a) occurring
among the organisms in the reference database, and
H  A, B   pa, bln pa, b
represents the relative entropy of the joint probability distribution p(a,b) of
occurrences of genes A and B accross the set of reference organisms.
10. Lecture WS 2003/04
Bioinformatics III
28
Quality of functional linkages
The inherent information in phylogenetic
profiles can be seen from the distributions
of scores from comparisons of all possible
protein pairs in each of seven organisms.
Pairwise comparisons of actual
phylogenetic profiles (solid lines) show
significantly more similar profiles
(indicated by larger mutual information
values) than pairwise comparisons of
shuffled profiles (dashed lines). Mutual
information scores MI between shuffled
profiles exceed 0.7 at a rate of 1 in 107
pairs, whereas scores between actual
profiles are greater than 1.2, indicating
that scores above 0.7 are statistically
likely to indicate legitimate functional
linkages between pairs of genes.
10. Lecture WS 2003/04
Bioinformatics III
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
29
Quality of functional linkages
1,131 S. cerevisiae and 1,231 E. coli
proteins whose functions are precisely
known were used to test the quality of
the phylogenetic profile linkages.
The quality of predicted functional
linkages, measured as the mutual
information scores between all pairs of
phylogenetic profiles, is plotted versus
the agreement between the proteins'
experimentally known pathways,
measured as the Jaccard coefficient
between the proteins' pathway
memberships in the KEGG database24.
Each point represents the average
values for 1,000 pairs of proteins.
Shuffled profiles rarely show high mutual
information values (inset).
10. Lecture WS 2003/04
Bioinformatics III
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
30
Quality of functional linkages
Mutual information scores plotted versus pathway
similarity on a linear scale show increasing trends.
The solid and dashed lines represent analytical
curves fit to the data of b by least squares. Scores
of 0.75 indicate approximately 35–50% accurate
predictions by this test, higher scores approach
100% functional accuracy.
For comparison, the percentage of proteins that
share no pathways in common show a decreasing
trend, as mutual information values increase
(inset). The accuracies of experimentally
determined protein interactions from large scale
yeast two-hybrid screens14, 15 indicating 14% and
44% accuracies, and mass spectrometry
experiments16, 17 indicating 27% and 76%
accuracies are shown with the dot-dashed
horizontal lines. As in b, each point represents the
average values of 1,000 pairs of proteins.
10. Lecture WS 2003/04
Bioinformatics III
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
31
Predicted genome-wide protein networks for yeast
Proteins are represented as vertices,
and derived functional linkages are
shown as lines connecting the
corresponding proteins. All linkages
with scores above a mutual
information value of 0.75 are drawn,
essentially by modeling the linkages
as springs that pull functionally linked
proteins together on the page. (Thus,
the lengths of the lines are not
meaningful, only the connections).
Groups of proteins sharing functional links
are seen to cluster together, representing
portions of genetic or functional networks.
Systems in gray circles are labeled with their
corresponding functions.
(For visual clarity, small protein networks,
including 1 five-protein system, 2 fourprotein systems and 31 two-protein
systems, have been omitted.)
10. Lecture WS 2003/04
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
Bioinformatics III
32
Predicted networks for pathogenic E.coli O157:H7
All linkages with scores above a mutual
information value of 0.85 are included.
For visual clarity, small protein networks, including
1 six-protein system, 2 four-protein systems, 9
three-protein systems and 40 two-protein systems
have been omitted.
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
10. Lecture WS 2003/04
Bioinformatics III
33
Clusters representing potentially new pathways
Clusters representing potentially
new pathways selected from
reconstructions of genome-wide
interaction networks of four
different organisms.
Boxes with thicker borders, and bold
lines denote the cluster core. Each
cluster was extended to include
operon partners, as well as
secondarily linked proteins that are
naturally grouped with the proteins in
the cluster but with a mutual
information value less than the
selected threshold; these are
represented by dotted lines and
boxes with thinner borders.
Thick red lines represent connections
between genes in an operon,
whereas colored boxes represent
homologous proteins. All selected
core clusters are composed of
proteins, at least 50% of which lack
precise functional assignments.
Boxes with dashed outlines represent
such uncharacterized proteins.
10. Lecture WS 2003/04
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
Bioinformatics III
34
Phylogenetic profiles for new gene clusters
The genes corresponding to
proteins within a cluster show
similar patterns of presence and
absence, indicated by red and
blue squares, respectively,
among the 57 genomes, labeled
across the top. The intensity of
red denotes the degree of
homology between the protein
labeled at the left with the best
matching protein sequence of
the corresponding genome.
Deeper red indicates stronger
sequence similarity, blue
indicates no detectable
similarity (BLAST E-value  1).
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
10. Lecture WS 2003/04
Bioinformatics III
35
Why can one still find entirely new systems?
In well-characterized systems like yeast, ca. 90% of the uncharacterized proteins
are linked in networks to proteins of known function. Most uncharacterized proteins
therefore appear to be additional components of known systems.
The few characterized proteins of the novel cellular systems detected here seem
to be strongly biased towards metabolic functions that occur commonly as more or
less discrete systems within cells, which can easibly be coinherited or horizontally
transferred.
The presented analysis seems an ideal way of discovering such systems. Of
course, it cannot indicate the precise biological function of these systems.
In traditional biology, the biological knowledge extended gradually along known
sets of pathways, rather than sampling all pathways evenly.
Again, the presented approach allows new discoveries.
Date & Marcotte
Nat Biotech 21, 1055 - 1062 (2003)
10. Lecture WS 2003/04
Bioinformatics III
36
Summary
Computational functional annotation of genes may be based on
(a) annotation by homology to genes with known function in other organisms
(b) combination of several, relatively search techniques as presented today.
Proteins often have multiple functions! We need to detect all of them.
The search techniques under (b) are biology-driven. This area is still in the
exploratory phase. Soon certain rules will emerge and allow to apply more
sophisticated computational techniques  a job for computer
scientists/bioinformaticians.
10. Lecture WS 2003/04
Bioinformatics III
37