Download repetitive extragenic palindromic sequences in pseudomonas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

CRISPR wikipedia , lookup

Neocentromere wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Primary transcript wikipedia , lookup

Genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Public health genomics wikipedia , lookup

Adeno-associated virus wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

NUMT wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene desert wikipedia , lookup

Ridge (biology) wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Point mutation wikipedia , lookup

Transposable element wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Gene wikipedia , lookup

Sequence alignment wikipedia , lookup

History of genetic engineering wikipedia , lookup

Microsatellite wikipedia , lookup

Minimal genome wikipedia , lookup

Pathogenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomics wikipedia , lookup

Human genome wikipedia , lookup

Metagenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
REPETITIVE EXTRAGENIC PALINDROMIC SEQUENCES IN
PSEUDOMONAS
SYRINGAE
PV.
TOMATO
DC3000
GENOME:
EXTRAGENIC
SIGNALS
FOR
GENOME
REANNOTATION
RAQUEL TOBES AND EDUARDO PAREJA BIOINFORMATICS UNIT,
INFORMATION TECHNOLOGIES GRANADA SPAIN [email protected]
ERA7
The absence of P.syringae REP elements in the principal pathogenicity gene clusters
suggest that genome fragments lacking REP sequences could be pointing to regions
recently acquired from other organisms and REP sequences could be a new tracer for
getting insight into the key aspects of bacterial genome evolution, especially for studying
pathogenicity acquisition. In addition, as the P. syringae REP sequence is species–specific
with respect to the sequenced genomes it is an exceptional candidate for use as a
fingerprint in precise genotyping and epidemiological studies P. syringae is an agriculturally
important plant pathogen with at least 50 pathovars based on host specificity. P. syringae
enters plant leaves through stomata (1) and produces necrotic lesions that are often
surrounded by chlorotic halos. The genome of P. syringae pathovar tomato DC3000 has
been recently described (1) and is considered as a model of most animal and plant
pathogens in the gamma Proteobacteria. In this group, pathogenicity seems to rely on type
III secretion systems (TTSS) that inject virulence effector proteins into host cells (1).
Repetitive Extragenic Palindromic (REP) sequences were firstly described in
enterobacteriacea and further in Pseudomonas putida. We have detected a species
specific repetitive sequence scattered over the chromosome of P. syringae DC3000 with
features of the REP sequences. The finding of REP sequences also in P.syringae confirms
the broad presence of this type of repetitive sequences in bacteria. We have
characterized these sequences and, based on REP sequence features, we have refined the
annotation of P. syringae DC3000 genome. REP sequences are difficult to detect for three
different reasons: first they change from specie to specie (species specific), second the
sequence changes slightly from copy to copy in a specie (imperfect repeats) and third they
are only partially palindromic. These idiosyncratic REP sequence features explain the fact
that it is necessary to use special search tools to find them. To detect REP sequences we
have used a BLAST based strategy specially designed to detect repetitive sequences in
the extragenic space of genomes (R. Tobes and E. Pareja Manuscript in preparation).
Similar to the distribution of REP described for Escherichia coli and P. putida
chromosomes (2) , REP sequences of P. syringae DC3000 are allocated specially in
intergenic spaces between convergent genes and their presence is scarce in spaces limited
by divergent genes. Thus, considering the number of spaces between convergent genes in
P. syringae chromosome, the REP sequence frequency in these spaces is almost the triple
of the expected frequency. In contrast, the number of REP sequences in spaces limited by
divergent genes is about 1/3 of the expected frequency. Some REP sequences appear
grouped in clusters. We have detected 54 pairs of REP sequences, 10 clusters of three
REP sequences, 1 cluster of four, 4 clusters of five, 4 clusters of six, 1 cluster of nine, 2
clusters of ten and 2 clusters of thirteen REP sequences. Within each cluster, there are
two differentiated types of sequence, one of them corresponding to the REP sequences
positively oriented and the other to those negatively oriented. In addition, the sequences
of the spacers between REP elements also correspond to two alternant types of sequence
specific for each cluster. Thus, the DNA fragments that follow positively oriented REP
sequences within a cluster shared a practically identical sequence while the fragments
following negatively oriented REP sequences shared another sequence Considering the
conserved palindromy of REP sequences and the conserved arrangement of the clusters it
is probable that REP sequences adopt conformations with peculiar secondary structures
especially suitable to be specifically recognized by proteins. REP sequences are binding
sites for proteins such as IHF, HU, gyrase and DNA polymerase I with decisive roles in
DNA physiology. This allows it to relate REP sequences with processes as important as
DNA packing, DNA supercoiling and replication. REP sequences are fundamentally
extragenic and maintain a conserved sequence along the genome. This conservation and the
roles in which REP sequences are involved do not seem compatible with an intragenic
presence. The evolutive pressure would not preserve the maintenance of a conserved DNA
sequence within different genes. According to the current annotation of P. syringae
DC3000 genome, 36 REP sequences are intragenic. We analyzed specifically each
intragenic REP sequence searching for possible misassignments during the open reading
frame (ORF) determination process. To detect genes that there were not real genes we
evaluated three points: 1. Very short length of the gene 2. Presence of several REP
sequences into the gene 3. Impossibility of finding genes with BLAST similarity to the
gene. Applying the described criteria we detected 32 cases in which the intragenic REP
sequences corresponded to genes that did not seemed to be real genes. The presence of
REP sequences in a gene also could indicate that the gene is shorter than predicted. To
detect if a gene containing a REP sequence was shorter, we analyzed all its BLAST similar
genes. If the majority of them lost the fragment of gene corresponding to the intragenic
REP we concluded that probably the ORF was shorter. We found five cases in which the
genes seemed to be shorter than predicted. In our study, we found data supporting an
ORFmisprediction for all genes containing a REP sequence. Hence, our analysis allows us to
conclude that probably all P. syringae REP sequences are extragenic. Applying a similar
strategy but in this case based on the generic absence of REP sequences in intergenic
spaces between divergent genes, we analyzed all REP sequences allocated within this type
of space. With the exception of one case, in all cases we found that one or both genes
limiting the space encoded hypothetical proteins or proteins from mobile elements. The
finding of REP sequences in this type of space can be used to refine genome annotation. In
addition, the REP sequences allocated between divergent genes can trace genome points
that could have suffered some kind of genomic plasticity event (recombination,
transposition, insertion, inversion).The annotation of each new genome depends on the
annotations of previously published genomes because the majority of the genes are
annotated by sequence similarity. This system produces a multiplicative effect on error
propagation underlining the importance of a high quality annotation for each gene in each
genome since it will influence future annotations. Our analysis of the allocation of all REP
elements in the chromosome of P. syringae DC3000 has allowed us to affirm that REP
elements are allocated in extragenic space. We have used this feature to detect ORFs
misassignments thus improving the annotation of P. syringae DC3000. We propose REP
sequences as markers of extragenicity that can be useful for gene prediction and for
refinement of the annotation of bacterial genomes. The absence of P.syringae REP
elements in the principal pathogenicity gene clusters suggest that genome fragments
lacking REP sequences could be pointing to regions recently acquired from other organisms
and REP sequences could be a new tracer for getting insight into the key aspects of
bacteria genome evolution, especially for studying pathogenicity acquisition. In addition, as
the P. syringae REP sequence is species–specific with respect to the sequenced genomes it
is an exceptional candidate for use as a fingerprint in precise genotyping and
epidemiological studies. 1. Buell, C.R., Joardar, V., Lindeberg, M., Selengut, J., Paulsen, I.T.,
Gwinn, M.L.,Dodson, R.J., Deboy, R.T., Durkin, A.S., Kolonay, J.F. et al. ,The complete
genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv.
tomato DC3000. Proc. Natl. Acad. Sci. U S A. 100 (2003) 1018110186. 2. ArandaOlmedo,I.,
Tobes, R., Manzanera, M., Ramos, J.L. , Marques, S.,Speciesspecific repetitive extragenic
palindromic REP, sequences in Pseudomonas putida. Nucleic Acids Res. 30 (2002) 18261833.