Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene expression profiling wikipedia , lookup
Ridge (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic imprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Molecular evolution wikipedia , lookup
Genomic library wikipedia , lookup
Causes of insertion sequences abundance in prokaryotic genomes? A problem of size Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris [email protected] IS elements : the simplest form of transposable elements - 700 to 2500 bp - coding only the information allowing their mobility ability to generate mutations : - by insertion within genes - by activate genes on insertion upstream - to generate extensive DNA rearrangements have been found to shuttle the transfer of adaptive traits such as : - antibiotic resistance - virulence - new metabolic capabilities Their exact nature is still debated : Selfish/Advantageous? - genomic parasites - beneficial agents Causes of insertion sequences abundance in prokaryotic genome ? Reasons largely unknown and widely speculated Hypotheses : - IS family specificity - Genome size - Frequency of horizontal gene transfer - Pathogenicity - Type of ecological associations - Human sedentarisation The current availability of hundreds of genomes renders testable many of these hypotheses. IS elements Identification : Problem : ISs annotations are heterogeneous, inaccurate or insufficient Solution : Reannotation of ISs using comparative study by adopting the nomenclature defined by Chandler (1998) - ISs have one or two consecutive ORFs encoding transposase protein - ISs are grouped into 21 distinct families ISs Reannotation (1) ISs CDS Detection All annotated CDS Genome x ISs Database Chandler et al. IS1A-IS1B (2) IS1A-IS21A-IS21B-IS1B IS elements reconstitution IS1 IS1 IS21 (3) IS1A-IS3A-IS3B-IS1A IS1 IS3 ISs complete or partial ISs fragments (> 20% of difference length) ISs with internal insertion Partial elements ISs Reannotation - Reassessment 262 genomes Annotated ISs CDS 1194 (11%) (2) Decteted ISs CDS 8823 (89%) 2115 (22%) 8123 ISs elements Shigella flexneri Number of Detected ISs CDS (1) Y = 0.77 ( 0.02) X + 5.86 ( 1.89) R2 = 0.81 (P< 0.0001) R = 0.95 (P< 0.0001) Number of Annotated ISs CDS (3) 83% are complete (may be active) Only 20% (1994) of Genbank ISs had a consistent classification The absence of ISs is not anecdotic 24% genomes lack IS 48% genomes [0-10] ISs High variability of the number of ISs / Genome of the number of ISs families / Genome Sulfolobus solfactaricus (archaebacteria) Bacillus haludorans (firmicute) Nitrobacter winogradskyi ( proteobacteria) Bordetella pertussis ( proteobacteria) Shigella sonnei ( proteobacteria) Number of ISs Number of Genomes Distribution of ISs in 262 genomes Number of ISs families Association with phylogenetic inertia Rapid dynamic of gain and loss The number of ISs evolve so fast, that there is no historical correlation The effect of IS family specificity Firmicute ; Proteo ; Proteo 100% Entero 90% Incongruent phylogenetic trees High diversity of ISs found within strains or closely related species The effect of IS family specificity : Examples Pseudomonas syringae tomato Pseudomonas syringae syringae 10 IS3 42 IS5 23 IS21 40 IS66 10 IS1111 13 ISNCY 1 IS91 14 IS3 1 IS5 7 IS3 43 IS5 7 IS21 2 IS66 1 IS1111 1 ISNCY 3 IS91 1 IS66 + = 139 ISs Pseudomonas syringae pv. phaseolicola 1 IS110 1 IS630 + = 18 ISs This effect is unlikely to explain the variability of ISs 52 IS256 = 116 ISs The effect of genome size Wilcoxon test : p<0.0001 N= 64 Spearman’s r=0.63, p<0.0001 198 Strong association between Genome size and IS number (and density) The larger the genome, the more IS elements it contains The effect of horizontal gene transfer Putative orthologs: Reciprocal best hits, proteins with >90% similarity and <20% length difference. A Strain specific region: Exclusive region to a strain which presented at least ten consecutive genes without an orthologs Lists of orthologs Strain A i B j B C Strain A specific region Strain Specific region Prophage-Database (Nestle, Casjeans, 2003) HGT-Database (Garcia-Vallve,2003) E. Coli O157:H7 Sakai The effect of horizontal gene transfer Wilcoxon test : p<0.0001 t-test : p<0.001 Spearman’s r= 0.31 p>0.1 (NS) 11.4% 5.2% Genomes lacking ISs have fewer HGT ISs are ~ 4 times more concentrated in HGT regions HGT may be a determinant of the presence of ISs, but not of its abundance The effect of horizontal gene transfer Spearman’s r=0.84, p<0.0001 IS families diversity in HGT regions is almost as high as in the entire genome HGT is a necessary but not sufficient condition to the presence of ISs The intensity of HGT is not a significant determinant of the IS abundance The effect of pathogenicity Yersinia pestis (plague) Shigella flexneri, sonnei (dysentery) Bordetella pertussis (whooping cough) Wilcoxon test : p<0.001 Wilcoxon test : p>0.5 4.3 N= 100 3.6 153 IS=0 No association between the presence of IS and pathogenicity 8% 17% 55% 100% Strong association between the frequency of IS and the facultative character of the ecological associations The effect of the type of ecological association Stepwise multiple regression Covariate Number of ISs Genome size Cumulative R2 We removed genomes lacking IS (possibly under sexual isolation) Kruskal-Wallis test : p>0.5 (NS) 0.4 Ecological association 0.47 Frequency HGT 0.47 Genome size is the most important variable Lifestyles is a nonsignificant determinant The effect of human sedentarisation (Mira et al.,2006) 1) Genomes with many ISs are from prokaryotes associated with humans or domesticated animals and plants. 2) Large intra-genomic IS expansions are recent. Kruskal-Wallis test : p>0.5 (NS) not indirectly directly No evidence that man-related prokaryotes have more Iss. Genome size explains ˜ 40% of the variance in IS abundance The smallest the genome, the lower the number but also the lower density of ISs - Selection could favor small genomes : optimal use of resources; the replication time (an increase in genome size caused by IS could be counter-selected) Density of ISs (/Mb) Wilcoxon test : p<0.05 Genomes with fewer ISs, correspond to the slowest growing prokaryotes fast slow Growth - ISs are selected to generate genetic variation : (such selection should be stronger in larger genomes) One explanation fits well the available data - Selection against transposition in genomes with higher density of deleterious transposition targets tranposition inactivates genes with high probability the total number of essential genes : ˜300 + 200-300 genes are nearly ubiquitous 500 nearly essential genes The abundance of IS elements in genomes could be mostly a question of space for not highly deleterious transposition events Conclusions High diversity of ISs found within strains or closely related species The number of ISs evolve so fast, that there is no historical correlation HGT may be a determinant of the presence of ISs, but not of its abundance Surprisingly, genome size alone is the best predictor of IS number and density Selection against transposition in genomes with higher density of deleterious transposition targets Impacts of IS abundance? IS expansion : observed expected % of breakpoints coincide with IS Bordetella parapertussis - increases the rate of genome rearrangements O/E R gene/intergene bronchiseptica - increases the Bordetella number of pseudogenes Number of ISs Number of ISs Acknowledgements E.P.C Rocha Institut Pasteur A. Danchin La Région Ile de France Examples Pseudomonas syringae syringae = 18 ISs 14 IS3 1 IS5 1 IS630 1 IS66 1 IS110 Nitrobacter winogradskyi = 117 ISs 37 IS3 32 IS5 27 IS630 2 IS21 14 IS481 4 ISNCY Shigella sonnei = 372 ISs 107 IS3 157 IS1 16 IS630 33 IS4 25 IS21 1 IS66 1 IS91 18 IS110 3 IS605 3 IS1111 4 ISAs1 2 ISNCY Association with stability ? Stability Large Repeats decrease genome stability density of repeats (Rocha, Trends Genetics, 03) Stabiliy But not ISs elements ? Number of ISs Association with phylogenetic inertia ? The number of ISs evolve so fast, that there is no historical correlation Two scenarios beneficial agents genomic parasites +IS +IS acquisition +IS +IS expansion -IS deletion lineage loss Association with lifestyle ? Burkholderia pseudomallei Burkholderia mallei 152 Escherichia coli K12 Shigella flexneri 52 Commensal 298 Obligatory pathogen Bordetella bronchiseptica Bordetella pertussis 36 Facultative pathogen Obligatory pathogen 2 Facultative pathogen 247 Obligatory pathogen -> Link with lifestyle host restriction, niche change, .. Bordetella bronchiseptica Yersinia pestis Yersinia pestis Bordetella bronchiseptica observed expected % of breakpoints coincide with IS Bordetella parapertussis Bordetella parapertussis Association with recent rearrangements ? Yersinia pseudotuberculosis Yersinia pseudotuberculosis Number of ISs IS expansion promoted frequent genomic rearrangements 247 ISs B. pertussis 99% similarity B. bronchiseptica B. bronchiseptica 99% similarity S. enterica typhymurium S. Enterica typhymurium E. coli K12 99% similarity S. enterica enterica serovar thyphi Shigella flexeneri 99% similarity Bordetella parapertussis 32 ISs Association with recent rearrangements ? 90% similarity E. coli K12 IS expansion increases the rate of genome rearrangements Association with pseudogenes ? Number of ISs in genes A B A B Or1’ Or1’ Or1 Or2’ IS Or2 Or1 IS Or2 Or1 Or2’ Or2 Number of ISs in intergenes A Intergenic region B Or1’ Or2’ Association with pseudogenes ? O/E R pseudo Number of ISs in genes R pseudo = ----------------------------Number of ISs in intergenes Number of ISs IS expansion increases the number of pseudogenes Conclusions High variability : - of the number of ISs / Genome - of the number of ISs families / Genome - of the number of ISs copies / Family IS have been recenlty acquired (HGT) IS expansion : - is associated with lifestyle/niche change - increases the rate of genome rearrangements - increases the number of pseudogenes +IS acquisitio n -IS deletion +IS expansion lineage loss Conclusions ISs are frequent but not all ubiquitous ISs number and families varie a lot Lack of association of the stability with the number of ISs The presence of ISs is associated with lifestyle beneficial agents IS expansion increases the rate of genome rearrangements IS expansion increases the number of pseudogenes genomic parasites Number of Genomes Number of Genomes How many IS ? Number of Genomes High variability of the number of ISs / Genome of the number of ISs families / Genome Number of Genomes Number of ISs Number of ISs families Number of ISs families How many IS ? B. pertussis 16 229 : IS110 : IS481 S. sonnei 157 106 33 25 : : : : IS1 IS3 IS4 IS21 112-108 126-124 34-22 Number of ISs families High variability of the number of ISs families / Genome of the number of ISs / Family : IS1 : IS3 : IS4 Log(Number of ISs/Genome) Number of ISs S. flexneri ISs families Hypothesis I IS induce short spikes of instability which are averaged out in a deep phylogenetic analysis Hypothesis II Invasions of highly replicative IS lead to deleterious instability and lineage loss