* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download LAPTh - CNRS
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Oncogenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Copy-number variation wikipedia , lookup
Transposable element wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic imprinting wikipedia , lookup
DNA polymerase wikipedia , lookup
Designer baby wikipedia , lookup
History of genetic engineering wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Transcription factor wikipedia , lookup
Ridge (biology) wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Holliday junction wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Deoxyribozyme wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Point mutation wikipedia , lookup
Genome editing wikipedia , lookup
Primary transcript wikipedia , lookup
Microevolution wikipedia , lookup
Eukaryotic DNA replication wikipedia , lookup
Genomic library wikipedia , lookup
Human Genome Project wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Minimal genome wikipedia , lookup
DNA replication wikipedia , lookup
Genome evolution wikipedia , lookup
Replication associated strand asymmetries in mammalian genomes In silico detection of replication origins Maxime Huvet Marie Touchon Yves d'Aubenton-Carafa Claude Thermes Samuel Nicolay Benjamin Audit Edward Brodie of Brodie Alain Arneodo (CGM, Gif sur Yvette) (ENS-Lyon) Supports: CNRS, ACI IMPBio, ANR « SECOND PARITY RULE » Long genome sequence fragments tend to show on the same strand: fA = fT and fG = fC Bacteria/Archaebacteria Human chromosomes 80 1,4 60 1 A (Mb) A (Mb) 1,2 0,8 0,6 0,4 40 20 0,2 0 0 0 0,2 0,4 0,6 0,8 1 1,2 1,4 0 20 T (Mb) 40 60 80 60 80 T (Mb) 80 1,4 1,2 60 G (Mb) G (Mb) 1 0,8 0,6 0,4 40 20 0,2 0 0 0 0,2 0,4 0,6 0,8 C (Mb) 1 1,2 1,4 0 20 40 C (Mb) LARGE SCALE PROPERTIES OF GENOMIC MUTATIONS Same mutation/repair processes on the 2 DNA strands Same values of complementary sustitution rates A G T C at equilibrium Second Parity rule (PR2): fA = fT and fG = fC (at large scales) (Chargaff, 1962; Sueoka, Lobry, 1995) What mechanisms cause composition asymmetries ? REPLICATION : asymmetry of mutation/repair processes between leading and lagging strands replication origin lagging strand 5’ 3’ 5’ leading strand 3’ EUBACTERIA: G > C and T > A in the leading strand SGC = nG – nC nG + nC >0 STA = nT – n A nT + n A >0 Composition asymmetry in procaryotes ORI TER TER Bacillus subtilis SGC = nG + nC SGC nG – nC 1 kb windows x 106 pb 5’ lagging strand G<C leading strand G>C 3’ What mechanisms cause composition asymmetries ? TRANSCRIPTION : asymmetry of mutation/repair processes between transcribed and non-transcribed strands non-transcribed strand RNA POLYMERASE 5’ 3’ 3’ 5’ 3’ transcribed strand 5’ EUBACTERIA: G > C and T > A on the non-transcribed strand SGC = nG – nC nG + nC >0 STA = nT – n A nT + n A >0 Skew profiles associated to transcription and replication in Eubacteria S = STA + SGC replicative skew profile transcriptional skew profile (-) ORI (+) 5’ 3’ 5’ 5’ 5’ 3’ 3’ 3’ 3’ 5’ lagging strand 3’ leading strand 5’ S 0 transcribed strand 0 superposition of replication and transcription ORI 5’ S S 5’ 0 lagging strand leading strand 3’ non-transcribed strand 3’ genes (strand +) genes (strand -) intergenic regions S Bacillus subtilis Mbp STRAND ASYMMETRIES IN EUKARYOTES ? 1. Strand asymmetries associated to transcription in the human genome Strand asymmetries associated to transcription in human genes Intergenic sequences Introns (126 000) Intergenic sequences ≈ 12 000 genes (no exons, no repeats) STA = nT – n A STA nT + n A 8 5’ 8 6 6 4 4 2 2 0 0 -2 -20 8 SGC = nG + n C SGC Mean skew associated to transcription ∆S = STA + SGC ~ 7% -2 -40 nG – nC 3’ 0 20 40 -40 -20 0 8 5’ 6 6 4 4 2 2 0 0 -2 20 40 20 40 (kb) 3’ -2 -40 -20 0 20 Upward jumps (5’) 40 -40 -20 0 Downward jumps (3’) 2. Strand asymmetries associated to replication in the human genome Skew profiles around human replication origins genes (strand +) genes (strand -) intergenic regions Superimposition of replication and transcription biases ORI genes (strand +) genes (strand -) intergenic regions S ORI 5 ' 3 ' Transcription : ∆S ~ ± 7% Replication : ∆S ~ + 14% S0 Conservation of skew profiles in mammalian genomes human mouse rat dog Conservation of replication origins in mammalian genomes 3. In silico detection of replication origins in the human genome Detection of upward jumps associated to replication Main problem : • necessity to avoid the jumps due only to transcription Genes ORI 5 ' S 3' Mean size : 30 kb ORI 0 ORI 100 kb 1 Mb Scale of analysis : • larger than typical size of genes • smaller than typical size of replicons necessity of multi-scale analysis Multi scale jump detection using the wavelet transform S S S numerous jumps w =100 kb few jumps w =200 kb S derivative first derivative high precision w =50 kb w =10 kb w low precision Multi scale jump detection using the wavelet transform position of transitions (1 kb) Signal smoothened at large scale (200 kb) Identification of transitions Asymmetry of the human genome Histograms of jump amplitude upward downward % « Factory roof » skew profiles x (Mb) « Factory roofs » around experimentally determined replication origins TOP1 S S MCM4 x (kb) Conservation of potential origins in mammalian genomes human mouse dog Model of eucaryotic replicon Replication terminaison sites : distributed between fixed adjacent origins O T O O Eucaryote at each cycle: Ori 1 S Procaryote after several cycles: Ori 2 T Ori 1 Ori 2 after N cycles: Ori 1 Ori 2 Detection of factory roofs using the wavelet transform factory roof wavelets 759 « factory roofs spanning » ~ 40% of the human genome ASYMMETRY OF HUMAN GENOME factory roofs = 40 % factory roofs <1% EUCARYOTIC REPLICON MODEL transcriptional skew profile (-) replicative skew profile (+) OR I 5’ 5’ 3’ 3’ 3’ 5 ’ OR I 3’ 5 ’ 3 ’ 5’ transcribed strand non-transcribed strand 3’ 0 superposition of transcription and replication ORI 5’ S S 5’ 0 ORI 3’ 5 ’ 3’ 5 ’ 3 ’ Comparison with replication timing data ori early Replication timing late Position on human chromosome 6 (Mbp) Woodfine et al., Cell Cycle (2005) GENE ORGANISATION IN HUMAN CHROMOSOMES Organisation of transcription around predicted replication origins Co-orientation of transcription and replication Model of mammalian chromatin organization Open chromatin Genomic DNA ORI ORI S Replication origins are situated at the center of open chromatin regions Conclusions • Existence of replication-coupled strand asymmetries in human genome • Replication origins correspond to large transitions of skew profiles • These transitions are conserved in mammalian genomes • Detection of more than one thousand putative origins active in germ-line cells • « Factory roof » profiles : regularly distributed termination sites • Essential rome of replication in organisation of gene order and expression