Download Design Principles in Biology:

Document related concepts

Epigenetics of human development wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Frameshift mutation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene desert wikipedia , lookup

Protein moonlighting wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Gene wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epistasis wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Mutation wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome evolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

NEDD9 wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Evolution & Design Principles in
Biology:
a consequence of evolution and natural selection
Rui Alves
University of Lleida
[email protected]
Course Website:http://web.udl.es/usuaris/pg193845/Bioinformatics_2009/
Part I: Molecular Evolution
Theory of Evolution
• Evolution is the theory that allows us to understand how
organisms came to be how they are
•In probabilistic terms, it is likely that all living beings
today have originated from a single type of cells
•These cells divided and occupied ecological niches, where
they adapted to the new environments through natural
selection
How did the first cell create different
cells?
Neutral Mutation (e.g. by
error in genome
replication)
How did the first cell create different
cells?
Neutral Mutation (e.g. by
error in genome
replication)
How did the first cell create different
cells?
Neutral Mutation (e.g. by error in
genome replication)
How did the first cell create different
cells?
Deleterious Mutation (e.g. by error in
genome replication)
How did the first cell create different
cells?
Deleterious Mutation (e.g. by error in
genome replication)
How did the first cell create different
cells?
Deleterious Mutation (e.g. by error in genome replication)
How did the first cell create different
cells?
Advantageous Mutation (e.g. by error in genome replication)
How did the first cell create different
cells?
Advantageous Mutation (e.g. by error in
genome replication)
And then there was sex…
Why Sex???
• Asexual reproduction is quicker, easier  more
offspring/individual.
• Sex may limit harmful mutations
– Asexual: all offspring get all mutations
– Sexual: Random distribution of mutations. Those with the
most harmful ones tend not to reproduce.
• Generate beneficial gene combinations
–
–
–
–
Adaptation to changing environment
Adaptation to all aspects of constant environment
Can separate beneficial mutations from harmful ones
Sample a larger space of gene combinations
What drives cells to adapt?
New Niche/
New conditions in old niche
What drives cells to adapt?
New (better adapted)
mutation
How do New Genes and Proteins appear?
• Genes (Proteins) are build by combining domains
• New proteins may appear either by intradomain
mutation of by combining existing domains of
other proteins
Cell Division
Cell
Division …
…
The Coalescent
•This model of cellular evolution has implications for
molecular evolution
•Coalescent Theory:
•a retrospective model of population genetics that traces all alleles of a
gene in a sample from a population to a single ancestral copy shared by
all members of the population, known as the most recent common
ancestor
Why is the coalescent the de facto
standard today?
Alternatives?
Current sequences have evolved from the same
original sequence (Coalescent)
Current sequences have converged to a similar
sequence from multiple origins of life
Back of the envelop support for
Back of the envelop support for ?
divergence
ACDEFGHIKLMNPQRSTVWY
A EDYAHIKLMNPQRGTVWY
AAi
AAk
AAi
AAk
AAk
AAk
   Log[ p1]  0
  Log[ p 2]  0
  Log[ p1]  0
  Log[ p 2]  0
AAi
ptot  p1 p 2
p1 p 2  p1  p 2
20
20
Convergence
ptot14 p16  p214 p120
Divergence
p 214 p16
Which is more likely?
Convergence
 p114  ()1
Divergence
About the mutational process
Point mutations:
• Transitions (A↔G, C↔T) are more frequent than transversions (all other
substitutions)
• In mammals, the CpG dinucleotide is frequently mutated to TG or CA (possibly
related to the fact that most CpG dinucleotides are methylated at the C-residues)
• Microsatellites frequently increase or decrease in size (possibly due to polymerase
slippage during replication)
Gene and genome duplications (complete or partial), may lead to:
• pseudogenes: function-less copies of genes which rapidly accumulate (mostly
deleterious) mutations, useful for estimating mutation rates!
• new genes after functional diversification
Chromosomal rearrangements (inversions and translocation), may lead to
• meiotic incompatibilities, speciation
Estimated mutation rates:
• Human nuclear DNA: 3-5×10-9 per year
• Human mitochondrial DNA: 3-5×10-8 per year
• RNA and retroviruses: ~10-2 per year
Consequences of the coalescent model?
So what if we accept the coalescent model?
A1-6
TSRISEIRR
A7
PSRISEIRR
A8-9
PKRISEVRR
A10-11
PQRISAIQR
A12-13
PQRISTIQR
A14
ASHLHNLQR
A15-17
TKHLQELQR
A18
SKHLHELQR
A19
PKNLHELQK
A20
SKRLHEVQS
A1
TSRISEIRR
A2
TSRISEIRR
A3
TSRISEIRR
A4
TSRISEIRR
A5
TSRISEIRR
A6
TSRISEIRR
A7
PSRISEIRR
A8
PKRISEVRR
A9
PKRISEVRR
A10
PQRISAIQR
A11
PQRISAIQR
A12
PQRISTIQR
A13
PQRISTIQR
A14
ASHLHNLQR
A15
TKHLQELQRE
A16
TKHLQELQRE
A17
TKHLQELQRE
A18
SKHLHELQRD
A19
PKNLHELQKD
A20
SKRLHEVQSE
So what if we accept the coalescent model?
A1-6
TSRI SEI RR
A7
PSRI SEI RR
A8-9
PKRI SEVRR
A10-11
PQRI SAI QR
A12-13
PQRI STI QR
A14
ASHLHNLQR
A15-17
TKHLQELQR
A18
SKHLHELQR
A19
PKNLHELQK
A20
SKRLHEVQS
A’1-7
A’10-13
A1-6
A7
A10-11
A12-A13
So what if we accept the coalescent model?
A’1-7
(p-t) SRI S E I RR
A8-9
P KRI S E VRR
A’10-13
P QRI S(a-t)I QR
A14
A SHLH N LQR
A15-17
T KHLQ E LQR
A18
S KHLH E LQR
A19
P KNLH E LQK
A20
S KRLH E VQS
4 3324 5 323
The study of sequence alignments can gives information about the evolution of the
different organisms!!!!
Phylogenetic tree reconstruction, overview
Computational challenge: There is an enormous number of different
topologies even for a relatively small number of sequences:
3 sequences: 1
4 sequences: 3
5 sequences: 15
10 sequences: 2,027,025
20 sequences: 221,643,095,476,699,771,875
Consequence: Most tree construction algorithm are heuristic methods not
guaranteed to find the optimal topology.
Input data for two major classes of algorithms:
1. Input data distance matrix, examples UPGMA, neighbor-joining
2. Input data multiple alignment: parsimony, maximum likelihood
Distance matrix methods use distances computed from pairwise or multiple
alignments as input.
Building phylogenetic trees of proteins
Genome 1
Protein A
Genome 2
Protein C
Genome 3
Genome …
Protein D
Protein B
Protein A
Protein B
…
Protein C
Protein D
Protein B
Protein D
Protein A
Protein C
Distance based phylogenetic trees
A1
A2
A3
…
A2
A1
5 substitutions
ACTDEEGGGGSRGHI…
A-TEEDGGAASRGHI…
ACFDDEGGGGSRGHL…
…
A1
A3
A3
3 substitutions
A2
8 substitutions
5
A1
A3
A2
3
Maximum likelihood phylogenetic trees
Probability of aa substitution
Alignment
A
-
E
D
…
ACTDEEGGGGSRGHI…
0.09 …
A-TEEDGGAASRGHI… A 1 0.01 0.2
ACFDDEGGGGSRGHL… - 0.01 1 0.0001 0.0001 …
…
E 0.2 0.0001 1
0.5
D 0.09 0.0001 0.5
…
1
Maximum likelihood phylogenetic trees
A2
Alignment
p(1,2)
ACTDEEGGGGSRGHI…
A-TEEDGGAASRGHI…
ACFDDEGGGGSRGHL…
…
p(1,3)
A1
5 substitutions
A1
A3
3 substitutions
p(2,3)>p(1,2)>p(1,3)
A3
A2
A1 p(2,3)
A3
A3
A1
A2
A2
8 substitutions
Statistical evaluation of trees:
bootstrapping
5
1
2
4
6
7
8
3
Motivation: Some branching patterns in a tree may be uncertain for statistical
reasons (short sequences, small number of mutational events)
Goal of bootstrapping: To assess the statistical robustness for each edge of the tree.
Note that each edge divides the leave nodes into two subsets. For instance, edge 7–8
divides the leaves into subsets {1,2,3} and {4,5}.However, is this short edge
statistically robust ?
Method: Try to generate tree from subsets of input data as follows:
• Randomly modify input MSA by eliminating some columns and replacing
them by existing ones, This results in duplication of columns.
• Compute tree for each modified input MSA.
• For each edge of the tree derived from the real MSA, determine the fraction
of trees derived from modified MSAs which contain an edge that divides the
leaves into the same subsets. This fraction is called the bootstrap value.
Edges with low bootstrap values (e.g. <0.9) are considered unreliable.
Statistical evaluation of trees:
bootstrapping
Other Trees
• Use genomes
• Use Enzymomes
• Use whatever group of molecules are
important for a given function
Part II: Design principles
Outline
• What are design principles
 How to study design principles
• Examples
What are design principles?
• Recurrent qualitative or quantitative rules that are
observed in similar types of systems as a solution
to a given functional problem
• Exist at different levels
Nuclear
Targeting
Sequences
Operon
Gene 1
Gene 2
Gene 3
How can design principles emerge in
molecular biology?
• Inteligent design?
Not a scientific hypothesis; out of the table
• Evolution?
Makes sense, but how could such regularities
emerge?
Climbing down mount improbable
• Overtime, edged stones would
accumulate on the slope.
• Smooth, round, stones
accumulate at the
bottom.
Design Principles:
- Smooth, roundish rocks roll down
the mountain.
- Edged, flat, rocks don’t.
Design principles in molecular biology
• Similarly, if a topology or set of parameters
has appeared through mutation and it can be
shown to create a molecular network that
functionally outperforms all other possible
alternatives in a given set of conditions, one
can talk about a design principle for the
system under those conditions.
[sensu engineering]
Index of talk
• How to identify design principles
• Design principles in:
– Gene expression
– Metabolic networks
– Signal transduction
– Development
• Design principles, what are they good for?
• Summary
First step, define the alternatives
Regulator
Regulator
_
+
Gene
Gene
X0
X1
X2
X3
X0
X1
X2
X3
First step, define the alternatives
X3
t
How strong should the feedback be?
X0
X1
X2
X3
Then, create models for each alternative
Regulator
Regulator
_
+
Gene
Gene
Finally:
• Compare the dynamic behavior of the models
for the two or more alternatives with respect to
physiologically relevant criteria.
Then, create models for each alternative
X0
X0
X1
X1
X2
X2
X3
X3
Index of talk
• How to identify design principles
• Design principles in:
– Gene expression
– Metabolic networks
– Signal transduction
– Development
• Design principles, what are they good for?
• Summary
The demand theory for gene expression
Regulator
Regulator
_
+
Gene
Gene
• Are there situations where positive regulation
of gene expression outperforms negative
regulation of gene expression and vice versa?
Regulating gene expression has principles
Regulator
Regulator
_
+
Gene
Gene
• Positive regulator:
– More effective when gene product in demand for large
fraction of life cycle.
– Less noise sensitive if signal is low.
• Negative regulator:
– More effective when gene product in demand for small
fraction of life cycle.
– Less noise sensitive if signal is high.
Genetics 149:1665; PNAS 103:3999; PNAS 104:7151;Nature 405: 590
Index of talk
• How to identify design principles
• Design principles in:
– Gene expression
– Metabolic networks
– Signal transduction
– Development
• Design principles, what are they good for?
• Summary
Negative overall feedback is a design principle in
metabolic biosynthesis
X0
X1
X2
X3
• Negative overall feedback:
– More effective in coupling production to demand.
– More robust to fluctuations.
Bioinformatics 16:786; Biophysical J. 79:2290
Index of talk
• How to identify design principles
• Design principles in:
– Gene expression
– Metabolic networks
– Signal transduction
– Development
• Design principles, what are they good for?
• Summary
Bifunctional sensors can be a design principle in
signal transduction
Signal
Sensor
Effect
Efector
Efector
Deactivator
• Bifunctional sensor:
– Performs best against cross talk
• Independent deactivator:
– Better integrator of signals
Mol. Microbiol. 48:25; Mol. Microbiol. 68: 1196
Index of talk
• How to identify design principles
• Design principles in:
– Gene expression
– Metabolic networks
– Signal transduction
– Development
• Design principles, what are they good for?
• Summary
Design principles in development
High demand, low signal
Signal
Signal
+
_
Regulator
Low demand, low signal
_
+
High demand, high signal
Gene
Low demand, high signal
Genetics 149:1665; PNAS 103:3999; PNAS 104:7151;Nature 405: 590
Index of talk
• How to identify design principles
• Design principles in:
– Gene expression
– Metabolic networks
– Signal transduction
– Development
• Design principles, what are they good for?
• Summary
Biological design principles are good to
understand why biology works as it does
Growth rate
Heat
shock
Expression of
important genes
• Biological design principles may connect
molecular determinants to functional
effectiveness.
time
BMC Bioinformatics 7:184
time
Underlying assumption
• Evolution of molecular networks can be
treated as modules.
• Work in the group of Uri Alon suggests that
– networks evolving to meet simultaneous goals
evolve in a modular fashion
– Networks evolving to meet a single goal evolve
globally
• Modularity seems like a reasonable first
assumption
PNAS 102:13773; PLOS Comp Biol 4:e1000206;BMC Evol biol 7: 169
The good news about function
• Sometimes, you get stuff for free!!!
• For example:
– networks that are responsive to signals, just because
they are responsive may have inbuilt buffering of
noise.
– Functions that are associated with marginally stable
proteins are favored because due to the large
dimensions of sequence space most randomly
selected sequences have a structure that is marginally
stable.
PNAS 100:14463; PNAS 103:6435; Proteins 46:105
How can biological design principles be
applied?
• Design of molecular circuits with specific
behaviors!!
Bistable
systems
Stable
Systems
Oscilations
Cell 113: 597; PLoS Comput Biol. 5:e1000319; PNAS 106: 6435
Unstable
systems
Index of talk
• How to identify design principles
• Design principles in:
– Gene expression
– Metabolic networks
– Signal transduction
– Development
• Design principles, what are they good for?
• Summary
Summary
• Design principles can be found in molecular
networks.
• Such principles can sometimes be connected
to selection for function effectiveness.
• Even in the absence of such a connection, if
they are valid they can be used to build
biological circuits with specific behaviors.