* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download file
Vectors in gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Essential gene wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
X-inactivation wikipedia , lookup
Pathogenomics wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Epigenetics of depression wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Oncogenomics wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Designer baby wikipedia , lookup
Genomic imprinting wikipedia , lookup
Minimal genome wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
Periodic clusters Non periodic clusters That was only the beginning… The human cell cycle G1-Phase S-Phase G2-Phase M-Phase The proliferation cluster genes are cell cycle periodic 4 3 2 0 -1 Gene Expression 1 Disrtribution of cell cycle periodicity -3 -4 Proportion -2 G2/M G1/S CHR 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 All genes Proliferation genes 1 5 10 15 20 Samples 25 2 30 3 4 35 5 6 CCP score 7 40 8 9 10 45 The cell cycle motifs are enriched among the periodic genes CHR ELK1 CDE E2F NFY 200 Not in the cluster, mutated in cancer Tabach et al. Mol Sys Biol 2005 150 100 50 TSS Potential regulatory motifs in 3’ UTRs Finding 3’ UTRs elements associated with high/low transcript stability (in yeast) Entire genome AAGCTTCC CCTACAAC Reverse the inference flow Motif finding Clustering 4 Expression 3 2 1 0 -1 -2 0 5 10 Time/tissues Diagnosing motifs using expression 15 Once we reverse the inference order we can • Enumerate and score all possible k-mer motifs • Examine the effect of “mutations” on motifs • Examine the effect of motif location within promoter • Examine the effect of motif combinations, distances within a combination • More? • • …But the correlation between gene cluster and motifs is imprecise in both directions: • there are genes in the cluster without the motif • • • and many genes with the motif do not respond. If gene control is multifactorial, groups of genes defined by a common motif will not be mutually disjointed partitioning the data into disjoint clusters will cause loss of information. • • A k-mer enumeration method: score every possible k-mer for an association with expression level Ag is expression level of gene g C is a basal expression level (same for all gs) The integer Nμg equals the number of occurrences of motif μ in gene g M a set of motifs Fμ is the increase/decrease in expression level caused by the presence of motif μ (same for all gs) Motifs characterization through Expression Coherence (EC) EC score = 0.05 Expression level 3 2 1 0 -1 -2 -3 2 4 6 8 10 12 14 Time ScanACE (Hughes et al.) EC score = 0.5 Expression level 4 3 2 1 0 -1 -2 2 4 6 8 10 12 Time 14 Expression coherence score, intuition 1 EC1=0 * 2 * * * EC2=0.66 ** *** * * 3 EC3=0.2 ** *** **** * Threshold distance, D 4 ** *** * **** EC4=0.2 Interaction of motifs Only M2 Expression level Only M1 Expression level M1 M1 AND M2 M2 G2 G2 Synergistic motifs A combination of two motifs is called ‘synergistic’ if the expression coherence score of the genes that have the two motifs is significantly higher than the scores of the genes that have either of the motifs Mcm1 SFF A global map of combinatorial expression control Pilpel et al. Nature Genetics 2001 Heat-shock Cell cycle Sporulation Diauxic shift MAPK signaling DNA damage STRE *High connectivity *Hubs *Alternative partners in various conditions PHO4 CCA ALPHA1 mRPE8 mRPE57 AFT1 PDR SWI5 MIG1 mRPE69 RAP1 mRPE72 GCN4 CSRE SFF ' mRPE34 MCB mRPE58 MCM1 mRPE6 RPN4 ECB BAS1 SCB LYS14 ABF1 SFF STE12 ALPHA2 MCM1' ALPHA1' HAP234 mRRPE PAC mRRSE3 Deduced network Properties Correlation -1 . Necessity -0.5 0 0.5 G2 G1 1 TF-TF interaction 0.2 Coherence Hierarchy Mbp1 Ndt80 Ume6 Swi4 MCM1' Fkh1 Expression Sufficiency 0.4 0.6 0.8 Ho et al. Nature. 2002 MCB MSE URS1 SCB MCM1' SFF' Detect the effect of mutations in a motif . Distance and orientation of motifs affect expression profiles ATG ATG 1-Correlation 2 ATG ATG Distance in b.p. ATG 1 0.5 200 0 mRRPE is closer 120 40 -40 -120 PAC is closer -2000 Expression coherence ATG 1.5 -0.1 -0.2 -0.3 -0.4 -0.5 36 19 8 14 20 2 3 7 1 2 0 Some typical expression patterns A Bayesian approach (conditional probability) Xi could “1” to denote denote: • The presences of motif m • It’s distance from TSS is < N • It’s on the coding strand • It neighbors another motif m’ ei = being expressed in patter i Or “0” otherwise Example: two rRNA processing motifs The two motifs Work together The two motifs’ orientation matters The procedure • Given that P(N|D)=P(N)*P(D|N) / P(D): • Search in the space of possible Ns to look for a one that maximizes the above probability • Impossible to enumerate all possible networks • Use cross validation: partition the data into 5 gene sets, learn the rules based on all but one and test based on the left-out, each time. For example: what does it take to belong to expression patter (4)? • Need to have RRPE and PAC • If PAC is not within 140 bps from ATG , but RRPE is within 240 bps then the probability of pattern 4 is 22% • If PAC is within 140 and RRPE is within 240 bp then 100% chance Inferring various logical conditions (“gates”) on motif combinations The Bayesian network predicts very accurately expression profiles Can make useful predictions in worm The modern synthetic approach Motif discovery from evolutionary conservation data S. Cerevisiae S. mikatae, S. kudriavzevii, S. bayanus). S. castellii S. Kluyveri Their intergenic sequences average 59 to 67% identity to their S. cerevisiae orthologs in global Alignments S. castellii and S. Kluyveri ~40% identity to Cerevisae Nucleotide conservation in promoters is highest close to the TSS TATA-containing genes All genes ? ? ? ? ? A set of discovered motifs NATURE | VOL 434 | 17 MARCH 2005 The data • • • • • • • Examined intergenic regions of human mouse rate and dog ~18,000 genes “Promoters”: 4kb centered on TSS 3UTRs based on RNA annotations 64 Mb, and 15 Mb in total respectively Negative control: Introns of ~120 Mb % of alignable sequence: promoters: 51% (44% upstream and 58% downstream of the TSS), 3’ UTR: 73%, Introns:34%, Entire genome: 28% The phylogenetic trees Questions: • How would addition of species affect analyses? • What if the sequences were not only mammalian? An example: a known binding site of Err-a in the GABPA promoter Questions: • What is the “meaning” of the other conserved positions? Discovery of new motifs: exhaustive enumeration of all 6-mers Discovery of new motifs: exhaustive enumeration of all 6-mers Targets of new motifs showed defined expression patterns Motifs often show clear positional bias – close to TSS Same methods to look for motifs in 3’ UTRs reveals strand-specific motifs