Download file

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Vectors in gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Essential gene wikipedia , lookup

Transposable element wikipedia , lookup

MicroRNA wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

X-inactivation wikipedia , lookup

Pathogenomics wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Epigenetics of depression wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Oncogenomics wikipedia , lookup

Microevolution wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Designer baby wikipedia , lookup

Genomic imprinting wikipedia , lookup

Minimal genome wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

NEDD9 wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Transcript
Periodic clusters
Non periodic clusters
That was only the beginning…
The human cell cycle
G1-Phase
S-Phase
G2-Phase
M-Phase
The proliferation cluster genes are cell cycle periodic
4
3
2
0
-1
Gene Expression
1
Disrtribution of cell cycle periodicity
-3
-4
Proportion
-2
G2/M
G1/S
CHR
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
All genes
Proliferation genes
1
5
10
15
20
Samples
25
2
30
3
4
35
5
6
CCP score
7
40
8
9
10
45
The cell cycle motifs are enriched among the periodic
genes
CHR
ELK1
CDE
E2F
NFY
200
Not in the cluster,
mutated in cancer
Tabach et al. Mol Sys Biol 2005
150
100
50
TSS
Potential regulatory motifs in 3’ UTRs
Finding 3’ UTRs elements associated with high/low transcript
stability (in yeast)
Entire genome
AAGCTTCC CCTACAAC
Reverse the inference flow
Motif
finding
Clustering
4
Expression
3
2
1
0
-1
-2
0
5
10
Time/tissues
Diagnosing motifs
using expression
15
Once we reverse the inference order
we can
• Enumerate and score all possible k-mer motifs
• Examine the effect of “mutations” on motifs
• Examine the effect of motif location within
promoter
• Examine the effect of motif combinations,
distances within a combination
• More?
•
•
…But the correlation between gene
cluster and motifs is imprecise in both directions:
•
there are genes in the cluster without the motif
•
•
•
and many genes with the motif do not
respond.
If gene control is multifactorial, groups of genes defined by a
common motif will not be mutually disjointed
partitioning
the data into disjoint clusters will cause loss of information.
•
•
A k-mer enumeration method: score every possible
k-mer for an association with expression level
Ag is expression level of gene g
C is a basal expression level (same for all gs)
The integer Nμg equals the number of
occurrences of motif μ in gene g
M a set of motifs
Fμ is the increase/decrease in expression level
caused by the presence of motif μ (same for all
gs)
Motifs characterization
through Expression
Coherence (EC)
EC score = 0.05
Expression level
3
2
1
0
-1
-2
-3
2
4
6
8
10
12
14
Time
ScanACE
(Hughes et al.)
EC score = 0.5
Expression level
4
3
2
1
0
-1
-2
2
4
6
8
10
12
Time
14
Expression coherence score, intuition
1
EC1=0
*
2
*
*
*
EC2=0.66
**
***
*
*
3
EC3=0.2
**
***
****
*
Threshold distance, D
4
**
***
*
****
EC4=0.2
Interaction of motifs
Only M2
Expression level
Only M1
Expression level
M1
M1 AND M2
M2
G2
G2
Synergistic motifs
A combination of two motifs is called
‘synergistic’ if the expression coherence
score of the genes that have the two motifs
is significantly higher than the scores of the
genes that have either of the motifs
Mcm1
SFF
A global map of combinatorial expression control
Pilpel et al. Nature Genetics 2001
Heat-shock
Cell cycle
Sporulation
Diauxic shift
MAPK signaling
DNA damage
STRE
*High connectivity
*Hubs
*Alternative partners
in various conditions
PHO4
CCA
ALPHA1
mRPE8
mRPE57
AFT1
PDR
SWI5
MIG1
mRPE69
RAP1
mRPE72
GCN4
CSRE
SFF '
mRPE34
MCB
mRPE58
MCM1
mRPE6
RPN4
ECB
BAS1
SCB
LYS14
ABF1
SFF
STE12
ALPHA2
MCM1'
ALPHA1'
HAP234
mRRPE
PAC
mRRSE3
Deduced network
Properties
Correlation
-1
.
Necessity
-0.5
0
0.5
G2
G1
1
TF-TF interaction
0.2
Coherence
Hierarchy
Mbp1
Ndt80
Ume6
Swi4
MCM1'
Fkh1
Expression
Sufficiency
0.4
0.6
0.8
Ho et al. Nature. 2002
MCB
MSE
URS1
SCB
MCM1'
SFF'
Detect the effect of mutations in a motif
.
Distance and orientation of motifs
affect expression profiles
ATG
ATG
1-Correlation
2
ATG
ATG
Distance in b.p.
ATG
1
0.5
200
0
mRRPE is closer
120
40
-40
-120
PAC is closer
-2000
Expression
coherence
ATG
1.5
-0.1
-0.2
-0.3
-0.4
-0.5
36 19 8
14 20 2
3
7
1
2
0
Some typical expression patterns
A Bayesian approach (conditional probability)
Xi could “1” to
denote denote:
• The presences
of motif m
• It’s distance
from TSS is < N
• It’s on the
coding strand
• It neighbors
another motif
m’
ei = being expressed in patter i Or “0” otherwise
Example: two rRNA processing motifs
The two motifs
Work together
The two motifs’
orientation matters
The procedure
• Given that P(N|D)=P(N)*P(D|N) / P(D):
• Search in the space of possible Ns to look for a
one that maximizes the above probability
• Impossible to enumerate all possible networks
• Use cross validation: partition the data into 5
gene sets, learn the rules based on all but one
and test based on the left-out, each time.
For example: what does it take to
belong to expression patter (4)?
• Need to have
RRPE and
PAC
• If PAC is not
within 140
bps from ATG
, but RRPE is
within 240
bps then the
probability of
pattern 4 is
22%
• If PAC is
within 140
and RRPE is
within 240 bp
then 100%
chance
Inferring various logical conditions (“gates”) on motif combinations
The Bayesian network predicts very accurately
expression profiles
Can make useful predictions in worm
The modern synthetic approach
Motif discovery from evolutionary
conservation data
S. Cerevisiae
S. mikatae,
S. kudriavzevii,
S. bayanus).
S. castellii
S. Kluyveri
Their intergenic
sequences average 59 to 67% identity
to their S. cerevisiae orthologs in global
Alignments
S. castellii and S. Kluyveri
~40% identity to Cerevisae
Nucleotide conservation in promoters is
highest close to the TSS
TATA-containing genes
All genes
?
?
?
?
?
A set of discovered motifs
NATURE | VOL 434 | 17 MARCH 2005
The data
•
•
•
•
•
•
•
Examined intergenic regions of human mouse rate and dog
~18,000 genes
“Promoters”: 4kb centered on TSS
3UTRs based on RNA annotations
64 Mb, and 15 Mb in total respectively
Negative control: Introns of ~120 Mb
% of alignable sequence:
promoters: 51% (44% upstream and 58%
downstream of the TSS),
3’ UTR: 73%,
Introns:34%,
Entire genome: 28%
The phylogenetic trees
Questions:
• How would addition of species affect analyses?
• What if the sequences were not only mammalian?
An example: a known binding site of
Err-a in the GABPA promoter
Questions:
• What is the
“meaning” of the
other conserved
positions?
Discovery of new motifs: exhaustive
enumeration of all 6-mers
Discovery of new motifs: exhaustive
enumeration of all 6-mers
Targets of new motifs showed defined
expression patterns
Motifs often show clear positional bias
– close to TSS
Same methods to look for motifs in 3’ UTRs
reveals strand-specific motifs