* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download repetitive extragenic palindromic sequences in pseudomonas
Neocentromere wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Primary transcript wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Public health genomics wikipedia , lookup
Adeno-associated virus wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Genomic imprinting wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gene desert wikipedia , lookup
Ridge (biology) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Point mutation wikipedia , lookup
Transposable element wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Human Genome Project wikipedia , lookup
Genomic library wikipedia , lookup
Sequence alignment wikipedia , lookup
History of genetic engineering wikipedia , lookup
Microsatellite wikipedia , lookup
Minimal genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
REPETITIVE EXTRAGENIC PALINDROMIC SEQUENCES IN PSEUDOMONAS SYRINGAE PV. TOMATO DC3000 GENOME: EXTRAGENIC SIGNALS FOR GENOME REANNOTATION RAQUEL TOBES AND EDUARDO PAREJA BIOINFORMATICS UNIT, INFORMATION TECHNOLOGIES GRANADA SPAIN [email protected] ERA7 The absence of P.syringae REP elements in the principal pathogenicity gene clusters suggest that genome fragments lacking REP sequences could be pointing to regions recently acquired from other organisms and REP sequences could be a new tracer for getting insight into the key aspects of bacterial genome evolution, especially for studying pathogenicity acquisition. In addition, as the P. syringae REP sequence is species–specific with respect to the sequenced genomes it is an exceptional candidate for use as a fingerprint in precise genotyping and epidemiological studies P. syringae is an agriculturally important plant pathogen with at least 50 pathovars based on host specificity. P. syringae enters plant leaves through stomata (1) and produces necrotic lesions that are often surrounded by chlorotic halos. The genome of P. syringae pathovar tomato DC3000 has been recently described (1) and is considered as a model of most animal and plant pathogens in the gamma Proteobacteria. In this group, pathogenicity seems to rely on type III secretion systems (TTSS) that inject virulence effector proteins into host cells (1). Repetitive Extragenic Palindromic (REP) sequences were firstly described in enterobacteriacea and further in Pseudomonas putida. We have detected a species specific repetitive sequence scattered over the chromosome of P. syringae DC3000 with features of the REP sequences. The finding of REP sequences also in P.syringae confirms the broad presence of this type of repetitive sequences in bacteria. We have characterized these sequences and, based on REP sequence features, we have refined the annotation of P. syringae DC3000 genome. REP sequences are difficult to detect for three different reasons: first they change from specie to specie (species specific), second the sequence changes slightly from copy to copy in a specie (imperfect repeats) and third they are only partially palindromic. These idiosyncratic REP sequence features explain the fact that it is necessary to use special search tools to find them. To detect REP sequences we have used a BLAST based strategy specially designed to detect repetitive sequences in the extragenic space of genomes (R. Tobes and E. Pareja Manuscript in preparation). Similar to the distribution of REP described for Escherichia coli and P. putida chromosomes (2) , REP sequences of P. syringae DC3000 are allocated specially in intergenic spaces between convergent genes and their presence is scarce in spaces limited by divergent genes. Thus, considering the number of spaces between convergent genes in P. syringae chromosome, the REP sequence frequency in these spaces is almost the triple of the expected frequency. In contrast, the number of REP sequences in spaces limited by divergent genes is about 1/3 of the expected frequency. Some REP sequences appear grouped in clusters. We have detected 54 pairs of REP sequences, 10 clusters of three REP sequences, 1 cluster of four, 4 clusters of five, 4 clusters of six, 1 cluster of nine, 2 clusters of ten and 2 clusters of thirteen REP sequences. Within each cluster, there are two differentiated types of sequence, one of them corresponding to the REP sequences positively oriented and the other to those negatively oriented. In addition, the sequences of the spacers between REP elements also correspond to two alternant types of sequence specific for each cluster. Thus, the DNA fragments that follow positively oriented REP sequences within a cluster shared a practically identical sequence while the fragments following negatively oriented REP sequences shared another sequence Considering the conserved palindromy of REP sequences and the conserved arrangement of the clusters it is probable that REP sequences adopt conformations with peculiar secondary structures especially suitable to be specifically recognized by proteins. REP sequences are binding sites for proteins such as IHF, HU, gyrase and DNA polymerase I with decisive roles in DNA physiology. This allows it to relate REP sequences with processes as important as DNA packing, DNA supercoiling and replication. REP sequences are fundamentally extragenic and maintain a conserved sequence along the genome. This conservation and the roles in which REP sequences are involved do not seem compatible with an intragenic presence. The evolutive pressure would not preserve the maintenance of a conserved DNA sequence within different genes. According to the current annotation of P. syringae DC3000 genome, 36 REP sequences are intragenic. We analyzed specifically each intragenic REP sequence searching for possible misassignments during the open reading frame (ORF) determination process. To detect genes that there were not real genes we evaluated three points: 1. Very short length of the gene 2. Presence of several REP sequences into the gene 3. Impossibility of finding genes with BLAST similarity to the gene. Applying the described criteria we detected 32 cases in which the intragenic REP sequences corresponded to genes that did not seemed to be real genes. The presence of REP sequences in a gene also could indicate that the gene is shorter than predicted. To detect if a gene containing a REP sequence was shorter, we analyzed all its BLAST similar genes. If the majority of them lost the fragment of gene corresponding to the intragenic REP we concluded that probably the ORF was shorter. We found five cases in which the genes seemed to be shorter than predicted. In our study, we found data supporting an ORFmisprediction for all genes containing a REP sequence. Hence, our analysis allows us to conclude that probably all P. syringae REP sequences are extragenic. Applying a similar strategy but in this case based on the generic absence of REP sequences in intergenic spaces between divergent genes, we analyzed all REP sequences allocated within this type of space. With the exception of one case, in all cases we found that one or both genes limiting the space encoded hypothetical proteins or proteins from mobile elements. The finding of REP sequences in this type of space can be used to refine genome annotation. In addition, the REP sequences allocated between divergent genes can trace genome points that could have suffered some kind of genomic plasticity event (recombination, transposition, insertion, inversion).The annotation of each new genome depends on the annotations of previously published genomes because the majority of the genes are annotated by sequence similarity. This system produces a multiplicative effect on error propagation underlining the importance of a high quality annotation for each gene in each genome since it will influence future annotations. Our analysis of the allocation of all REP elements in the chromosome of P. syringae DC3000 has allowed us to affirm that REP elements are allocated in extragenic space. We have used this feature to detect ORFs misassignments thus improving the annotation of P. syringae DC3000. We propose REP sequences as markers of extragenicity that can be useful for gene prediction and for refinement of the annotation of bacterial genomes. The absence of P.syringae REP elements in the principal pathogenicity gene clusters suggest that genome fragments lacking REP sequences could be pointing to regions recently acquired from other organisms and REP sequences could be a new tracer for getting insight into the key aspects of bacteria genome evolution, especially for studying pathogenicity acquisition. In addition, as the P. syringae REP sequence is species–specific with respect to the sequenced genomes it is an exceptional candidate for use as a fingerprint in precise genotyping and epidemiological studies. 1. Buell, C.R., Joardar, V., Lindeberg, M., Selengut, J., Paulsen, I.T., Gwinn, M.L.,Dodson, R.J., Deboy, R.T., Durkin, A.S., Kolonay, J.F. et al. ,The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000. Proc. Natl. Acad. Sci. U S A. 100 (2003) 1018110186. 2. ArandaOlmedo,I., Tobes, R., Manzanera, M., Ramos, J.L. , Marques, S.,Speciesspecific repetitive extragenic palindromic REP, sequences in Pseudomonas putida. Nucleic Acids Res. 30 (2002) 18261833.