Download RISE AND FALL OF GENE FAMILIES Dynamics of Their Expansion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Vectors in gene therapy wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Pathogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Public health genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

NEDD9 wikipedia , lookup

Transcript
Evolution of Plant Stress Responsiveness:
Genome-wide and Gene Family Level Analysis
Shin-Han Shiu
Department of Plant Biology
KBS, 1/18, 2008
Outline

Major interests and why

Gene families and stress responsiveness:
 The interplay between gene family expansion, duplication
mechanism, and the elusive selection pressure

The Receptor Kinase family as an example
 One of the biggest plant gene families and their involvement in plant
biotic interactions

If there is enough time, the short story on plant pseudogenes
 When can you can a gene pseudogene?
Major interests
Molecular
evolutionary
patterns

Source of selection
pressure: abiotic
and biotic stress
conditions

Target of selection:
duplicate genes
Genetic basis
of adaptation
Where does all these duplicates come from

Whole genome duplication
+

Tandem duplication

Segmental duplication

Replicative transposition
Plasticity of plant gene contents

3 whole genome duplications in the Arabidopsis thaliana
lineage over the past ~150 million years
15,000*
Expected
30,000
60,000
120,000

Observed
Arabidopsis
gene content:
21,000**
More “recent” retentions in plants
*: Number of orthologous groups in shared families between Arabidopsis and rice.
**: Number of genes in shared families.
Shiu et al. (2006) PNAS
Plant Gene Family Evolution: Major questions

What is the rate of gene
gains in plants?

Do certain types of
genes have higher gain
rate?

What is the influence of
duplication
mechanisms?

Finally, how does genes
that are responsive to
stresses behave?
AtGenExpress microarray dataset
22 stress conditions
Measuring Lineage-specific Gain

Orthologous group and lineage-specific gain
 Reconcile species and gene trees
Retention rate along the A. thaliana lineage
Diminishing rate of retention over time

Retained (R) Rate (R/My)
1
2
3
M
R
P
A
1
1521-1576
3.0- 3.2
2
734-1479
9.2-18.5
3
5774-6995 48.1-58.3
Expansion at the gene family level

Lineage-specific gains per family in one plant lineage are
moderately correlated with gains in the other lineage.
E.g. a family with 3 OGs
Moss
1
4
Moss
3
4
Moss
1
At
9
Moss
5
P. patens lineage-specific gains
1
150
LRR
LRR
120
Protein kinase
90
Kinesin Protein kinase
y = 0.32x
r 2 = 0. 33
ABC
trans
60
AP2
Mito_carr
30
UDPGT
PPR
P450
NB-ARC
C1
PPR
0
0
50
100
150
200
250
300
A. thaliana lineage-specific gains
Rensing et al. (2008) Science
 N Obs 

Enrichment: log 
N 
 Exp 
log(freq)
Expansion at the orthologous group level
log(OG size)
Two major patterns in OG expansion
Convergent expansion
Single lineage expansion
>6
>6
5
5
5
4
4
4
3
Poplar
>6
Rice
Moss


3
3
2
2
2
1
1
1
0
0
0
0
1
2
3
4
5 >6
Arabidopsis thaliana
−0.7
0
0.7
log2(Obs/Exp)
0
1
2
3
4
5 >6
Arabidopsis thaliana
 N Obs 

Enrichment: log 
N 
 Exp 
0
1
2
3
4
5 >6
Arabidopsis thaliana
Expansion patterns and duplication mechanisms

Comparison of ratios between tandem and non-tandem genes
 e.g. for A-M orthology
OG type
A-M
A-R
A-P
Convergent
Single-lineage
Tandem
756
848
Non-tandem
4500
2918
Ratio
0.17
0.30
Method for defining OG
Similarity
Tree
Similarity
Tree
Similarity
Tree
Expansion pattern
Convergent 1
Single-lineage2
0.17 (756/4500) < 0.30 (848/2918)
0.16 (831/5297) < 0.40 (1443/3566)
0.31 (959/3115) < 0.47 (644/1375)
0.27 (844/3073) < 0.50 (1631/3294)
0.29 (1141/3944) < 0.60 (741/1234)
0.26 (1014/3930) < 0.64 (1578/2452)
P values
3
2.2×10-23
9.4×10-88
3.1×10-12
2.3×10-33
7.2×10-38
1.0×10-83
Summary I

Duplicate gene turn over
 But even though some of them are retained for millions of years, the
majority of them will be lost over hundreds MY time scale.

The degree of lineage-specific expansion is similar at the family
level but with substantial variation

Expansion patterns fall into two major categories
 Convergent expansion
 Single lineage expansion

Orthologous group with single lineage expansion
 Tend to be enriched in tandemly repeated genes
What's so special about tandem genes

Duplication rate (event per unit time):
 Whole genome duplication: 1 event / ~50 million years
 Tandem duplication: multiple events / generation

Rate of recombination
 Recombination rate: Pathogen attack > control
 Lucht et al., 2002. Nature.
 Recombination rate: Tandem > non-tandem
 Zhang & Gaut, 2003. Genome Res.
Gene family expansion and functional bias

Question:
 What types of genes tend to experience expansion?
 What is the influence of duplication mechanism?

Classification of genes:
 In OG without expansion
 In OG with expansion

Gene Ontology, controlled vocabulary describing
 Gene functions:
 e.g. protein kinase, involved in attaching phosphates onto self or other
proteins, serving as a molecular switch.
 Biological processes involved
 e.g. serine/threonine phosphorylation, the process of attaching
phosphate onto amino acid ser or thr.
 Location within the cell
 e.g. plasma membrane
Functional bias of gene retention

Stress response categories over-represented in the vascular
plant lineage
Cellular component categories: T vs. NT


Tandem: Extracellular region, cell surface, endomembrane
Non-tandem: cytosol, cytoskeleton, nucleus
Biological process categories: T vs. NT


Tandem: kinases, glycosinolate transferase, toxin responses
Non-tandem: regulation & hormone metabolism
cellular
metabolism
generation
precursor met
and energy
phosphorus
metabolism
secondary
metabolism
hormone
metabolism
regulation
metabolism
glycosinolate
metabolism
metabolism
regulation
physiological
process
regulation
biological
process
regulation
cellular
process
regulation
cell physiol
process
toxin
metabolism
regulation
cellular
metabolism
response
toxin
physiological
process
biological
process
cellular
process
localization
establishment
localization
cellular
physiological
process
transport
cell
communication
response to
stimulus
response to
chemical
stimulus
response
endogenous
stimulus
response
stress
defense
response
response to
abiotic
stimulus
response to
biotic stimulus
signal
transduction
(ST)
response drug
response
hormone
stimulus
response to
osmotic stress
response to
other
organism
peptide
transport
lipid
transport
drug transport
cell surface
receptor linked
ST
response
ABA
stimulus
response
salt stress
response to
bacterium
Biological process categories: T vs. NT (contd.)


Tandem: response to stimuli, various transport functions
Non-tandem: cell-cell communication and hormone response
cellular
metabolism
generation
precursor met
and energy
phosphorus
metabolism
secondary
metabolism
hormone
metabolism
regulation
metabolism
glycosinolate
metabolism
metabolism
regulation
physiological
process
regulation
biological
process
regulation
cellular
process
regulation
cell physiol
process
toxin
metabolism
regulation
cellular
metabolism
response
toxin
physiological
process
biological
process
cellular
process
localization
establishment
localization
cellular
physiological
process
transport
cell
communication
response to
stimulus
response to
chemical
stimulus
response
endogenous
stimulus
response
stress
defense
response
response to
abiotic
stimulus
response to
biotic stimulus
signal
transduction
(ST)
response drug
response
hormone
stimulus
response to
osmotic stress
response to
other
organism
peptide
transport
lipid
transport
drug transport
cell surface
receptor linked
ST
response
ABA
stimulus
response
salt stress
response to
bacterium
Stress responsiveness

Expression data set:
 Arabidopsis thaliana
 Under 22 abiotic and biotic stress conditions

Definition: stress responsiveness
 For a given gene
 ET: Expression level under stress condition
 Ec: Expression level under mock treatment control
 If ET >> Ec:
 Significant UP regulation
 If ET << Ec:
 Significant DOWN regulation

Question: do stress responsive genes tend to be those that are
gained throughout plant evolution?
Expansion of responsive genes and conditions

Genes in expanded OGs tends be enriched in stress responsive
genes
Response
Up regulation
OG type
A-M
Statistical test
1
Exp
Down regulation
A-R
2
T/N
A-P
A-M
A-R
Exp
T/N
Exp
T/N
Exp
T/N
+
T
+
T
+
N
Exp
A-P
T/N
Exp
T/N
3
Abiotic stress conditions
+
UV-B
T
Wounding
+
T
+
Cold4C
+
N
+
+
N
+
+
+
+
+
Salt
+
+
+
+
+
+
Osmotic
T
+
N
+
+
+
+
+
Biotic stress conditions3
AvrRpm1
+
+
+
DC3000
+
+
+
Flg22
+
T
+
T
+
GST-NPP1
+
T
+
T
+
T
HrcC-
+
T
+
T
+
T
HrpZ
+
P. infestans
+
T
+
T
+
T
Psph
+
T
+
T
+
T
+
the 5% level
+
N
Heat
Drought
+: significant at
+
N
Stress responsiveness and duplication mechanisms

Enrichment of tandemly over non-tandemly expanded genes
under biotic conditions
Response
Up regulation
OG type
Statistical test
A-M
Exp1
Down regulation
A-R
T/N2
Abiotic stress conditions3
+
UV-B
T
A-P
A-M
Exp
T/N
Exp
T/N
Exp
T/N
+
T
+
T
+
N
Wounding
+
T
+
Cold4C
+
N
+
+
N
+
+
+
Drought
+
+
T
Salt
+
+
+
+
+
+
Exp
A-P
T/N
Exp
T/N
+
N
Heat
Osmotic
A-R
+
+
N
+
N
+
Biotic stress conditions3
1
AvrRpm1
+
+
+
DC3000
+
+
+
Flg22
+
T
+
T
+
GST-NPP1
+
T
+
T
+
T
HrcC-
+
T
+
T
+
T
HrpZ
+
P. infestans
+
T
+
T
+
T
Psph
+
T
+
T
+
T
+
Significant at the 5% level
+
+
+
T: tandem >> non-tandem
N: non-tandem >> tandem
Tandem genes tend to be “bioticly” responsive

This does not mean biotic responsive genes tend to be tandem
 Among GO molecular function categories that are enriched in genes
respond to biotic stresses:
Tandem >> non-tandem
1
ns
P
1
a
t
M 000 2
s h
NP C- Z
P
e
f
R
2
3
T
n
p
avr DC Flg
GS Hrc Hrp P-i Ps
Non-tandem >> tandem
DNA binding
nucleic ac
transcription factor
transcription regulator
binding
ion binding
metal ion binding
transition metal ion binding
carbohydrate binding
oxidoreductase
transferase
glycosy
UDP-glycosy
kinase_activity
Summary II

Over the course of plant evolution, retention rate:
 Stress response genes >> genome average

True for genes up-regulated in both biotic and abiotic stress
conditions

Influence of duplication mechanism, particularly for biotic
stress conditions, retention rate:
 Tandem >> non-tandem

However, genes responsive to biotic stimuli are not necessarily
tandem
 Depend on their location in the signaling network
 e.g. Plant receptor kinase: biotic -> tandem
 e.g. Transcription factors -> non-tandem, presumably WGD
Receptor Kinase
Arabidopsis
Transmembrane Kinase 1
Shiu & Bleecker (2001) Science’s STKE
Functional bias: the Receptor-Like Kinase family
Shiu & Bleecker (2001) PNAS
The Kinase superfamily

Family size differences imply differential expansion
 Kinase: >1000 in A. thaliana, >1600 in Oryza sativa
 RLK/Pelle: ~600 in At, ~1200 in Os

Animal homolog:
 Drsophila: Pelle
 Mammalian: IRAKs
Shiu et al. (2004) Plant Cell
Receptor kinase configuration
ECD
Kinase
RLK
RLCK Kinase
Other Kinases
Arabidopsis
thaliana (A)
Populus
trichocarpa (P)
148
388
462
187
453
1003
159
376
Oryza sativa (O)
911
Physcomitrella
patens (M)
Chlamydomonas
reinhardtii
73
256
356
2
424
Ostreococcus
tauri
93
Innovation
LysM
GDPD
Thaumatin
CHASE
DUF26
LRR
LRR
GH18
GH18
DUF26
Thaumatin
LRR
DUF26
Thaumatin
Functional bias: motivated by RLK studies
Shiu et al., 2004 Plant Cell
Stress responsiveness of RLKs

RLKs are more responsive to stress than genome average
Response
Statistical test
Up regulation
RLK
Abiotic stress conditions
UV-B
O
Wounding
O
Drought
U
Cold4C
Heat
U
Salt
Osmotic
O
Biotic stess conditions
Flg22
GST-NPP1
HrcCP.infestans
Psph
HrpZ
AvrRpm1
DC3000
Down regulation
T/N
RLK
T/N
T
O
O
O
N
N
na
N
O
N
O
O
O
O
O
T
T
T
T
T
O
T
O
N
O
O
O
N
N
O
O
N
Stress responsiveness of RLKs

Tandem RLKs are more responsive to biotic stress than nontandem RLKs
Response
Statistical test
Up regulation
RLK
Abiotic stress conditions
UV-B
O
Wounding
O
Drought
U
Cold4C
Heat
U
Salt
Osmotic
O
Biotic stess conditions
Flg22
GST-NPP1
HrcCP.infestans
Psph
HrpZ
AvrRpm1
DC3000
Down regulation
T/N
RLK
T/N
T
O
O
O
N
N
na
N
O
N
O
O
O
O
O
T
T
T
T
T
O
T
O
N
O
O
O
N
N
O
O
N
Stress responsiveness and tandem RLKs

Responsiveness (R) of an RLK subfamily
 For subfamilies with ≥ 10 genes
 i: subfamily
 j: condition
 UP: # of up-regulated genes
 DN: # of down-regulated
Ri  
j
UPj
Ni
or Ri  
j
DN j
Ni
The “RLK swarm” model
In the context of biotic stress signaling networks
T > NT
NT > T
NT > T
T > NT
Summary III

Innovation in the RLK/Pelle family
 Most RK configuration established > 700 million ago.
 Plenty evidence of domain shuffling, but the rate is not high.
 Shuffled domains suggest involvement in biotic stress perception.

History of expansion
 4 major turnover patterns
 Substantially more recent gains in poplar and rice
 Mostly involved subfamilies with lots of tandem repeats

Stress responsiveness
 RLK > genome average
 Tandem > non-tandem
 Biotic > abiotic
 Stress responsive genes are not necessarily tandem
Plant pseudogenes

Pseudogenes are:
 Genomic DNA sequences similar to normal genes but nonfunctional

For protein coding genes, non-functional to many means:
 They have frameshift mutation or premature stop codons
 They are not transcribed into mRNA
 They exhibit signatures of neutral selection
Pseudogene numbers and family size
Gene family size is generally correlated with the number of
pseudogenes in the family in question.
120

100
Ank
80
Pkinase
Pkinase_tyr
PPR
NB-ARC
60
LRR_1
40
P450
Myb_DNA_binding
20
zf-C3H4
LRRNT-2
RRM1
0
Number of pseudogenes
F-box
0
200
400
600
Domain family size
800
1
2
Family size (S) Slope Spearman's rank (ρ) p-value
Overall
0.1247 0.5484
<2.2e-16
S < 10
0.0967 0.3000
<2.2e-16
10 ≤ S < 25
0.1259 0.2291
3.94e-5
25 ≤ S < 50
0.1950 0.2307
0.0209
50 ≤ S < 100
0.1650 0.3317
0.0152
S > 100
0.1177 0.6042
3.20e-4
Selection pressure on pseudogenes

Pseudogenes still show signatures of purifying selection
Determine pseudogene expression

Tiling microarray
 Cover the whole genome, regardless of the annotation
 Can distinguish sense and antisense transcripts
Transcript array
Tiling array
Exon
UTR
Intron
Cis-regulatory elements
Novel genes
MAR (Matrix attachment regions)
Selection pressure on pseudogenes

Pseudogenes still show signatures of purifying selection
A. Arabidopsis
B. Rice
Summary IV

Relationships between gene family sizes and the numbers of
pseudogenes
 Positively correlated
 Larger gene families tend to loss more frequently than smaller
families

Pseudogene still shows signature of purifying selection
 Mostly may due to the fact that pseudogenization event occurred
relatively recently

Pseudogenes are still expressed
 Significantly higher than intron antisense expression
 In rice, pseudogene expression is even as high as that among
presumably functional genes
Acknowledgement

Lab members
 Melissa Lehti-Shiu
 Gaurav Moghe
 Cheng Zou

Past member
 Kosuke Hanada, RIKEN

Collaborators
 Jeff Conner
 Gregg Howe, PRL
 Rong Jin, CSE
 Doug Schemske
 Mike Thomashow, PRL

Funding:


Takk!
http://blog.riflegear.com/