Download Poster

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia , lookup

Genetic code wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome (book) wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Point mutation wikipedia , lookup

Minimal genome wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Designer baby wikipedia , lookup

Gene wikipedia , lookup

Microevolution wikipedia , lookup

Gene desert wikipedia , lookup

Transposable element wikipedia , lookup

RNA-Seq wikipedia , lookup

Microsatellite wikipedia , lookup

Human Genome Project wikipedia , lookup

Pathogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Sequence alignment wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genomics wikipedia , lookup

Human genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
PreDetector : Prokaryotic Regulatory Element Detector
Samuel Hiard1, Sébastien Rigali2, Séverine Colson2, Raphaël Marée1 and Louis Wehenkel1
1
Department of Electrical Engineering and Computer Science & Centre for biomedical inegrative genoproteomics CBIG/GIGA – University of Liège, Sart-Tilman B28 Liège, Belgium
2 Centre for Protein Engineering – University of Liège, Sart-Tilman B6 Liège, Belgium
Abstract
PreDetector is a stand-alone software, written in java. Its final aim is to predict regulatory sites for prokaryotic species. It comprises two functionalities.
The first one is very similar to Target Explorer1. From a set of sequences identified as potential target sites, PreDetector creates a consensus sequence and computes its scoring
matrix. This sequence and matrix can be saved on a file and, then, be used to find along a selected genome the sequences that are close enough to the consensus sequence. To
this end, a score is attributed to each locus in the genome according to the similarity measure defined by the matrix. The output of this functionality is filtered with a cut-off score
and then directly used as input by the second one.
The second functionality starts by fetching the gene positions of the selected species from the NCBI server. The loci having above cut-off score are then classified into four
classes, allowing multiple classes for one element. This gives the biologists a better view of his discovered sequences.
Matrix Generation
When biologists search for a regulation motif, they find
several potential sequences. We then have to find a
way to obtain a consensus sequence that averages the
potential ones.
The first point would be to make a kind of alignment of
the potential sequences. Target Explorer1 allows
variable lengths for the sequences, but PreDetector
doesn’t. It just takes the sequences « as it » and starts
the generation of the matrix.
The matrix should reflect the fact that nucleotides with
higher frequencies at some position in the observed
set should have a greater impact on the score on that
position than nucleotides that are more equally
distributed.
In the other hand, nucleotides with high expected
frequencies along the genome should not have much
importance, as they are likely to be found, and
conversely.
So, the weight function for a specific nucleotide in the
matrix is the following one1 :
weighti , j

n
 ln
i, j
 pi  / N  1
pi
Consensus search in
genome
When the matrix is computed, it can be used to find
similar loci in the genome.
The score for each locus is calculated as the sum of
the values that each base of the sequence has in the
weight matrix.
The four classes
1) Regulatory : The distal is located in the userspecified bounds, and at least
one nucleotide is not in a gene
2) Upstream : The distal is facing a start codon
and is not in a gene
3) Coding : The distal is in a gene
4) Terminator : At least one nucleotide facing a
stop codon, and no start codon
Exemple : Use the previous matrix to find similar loci
on nucleotides 100 – 200 on gene X of
Drosophila Melanogaster.
Id
1
2
3
4
5
Strand
for
for
rev
rev
for
Seq
CCGGC
CCGAT
AGCGC
TCCGG
TCGTT
Pos
38
20
29
37
58
Score
4.01
2.45
2.45
2.21
2.12
Screen shots
Matrix Generation
(Only the first 5 results are shown here).
Then, only sequences that have a score greater than a
user-defined cut-off score are kept. In this exemple, we
could set the cut-off score at 2.40 and keep only the
first three elements
Search Parameters
where :
- ni,j is the observed frequency of nucleotide i in
position j
- N is the number of sequences in the set
- pi is the expected frequency of nucleotide i in the
genome
PreDetector in two words
Sequence search
ACGT
…AACGTTTTTACGTCCCCACGT…
Exemple of matrix
Let’s assume that we have experimentally discovered
these motifs:
Classification
Results
Terminator
Coding
A
A
C
C
C
C
G
G
G
T C
G T
C T
on a specie known to have 40% CG, the consensus
matrix will then be :
A
0.65 -1.39 -1.39 -1.39 -1.39
C
0.41 1.39 -1.39 0.41 0.41
G
-1.39 -1.39 1.39 0.41 -1.39
T
-1.39 -1.39 -1.39 0.08 0.65
Score ≥ Threshold
Genes positions
Regulatory
NCBI
Server
Upstream
Classification
Conclusion
When several hits have been found, PreDetector then
classifies them into 4 different classes : Regulatory,
Upstream, Coding and Terminator, allowing multiple
classes per element.
PreDetector can play an important role in automatic
regulatory element detection and validation. It also can
be upgraded for eukaryotic species handling.
To achieve this goal, PreDetector connects to the NCBI
server, and downloads the specie’s genes positions.
References
The classes are described in detail on the next column.
1. Target Explorer: an automated tool for the identification of new target genes for
a specified set of transcription factors, Alona Sosinsky, Christopher P. Bonin,
Richard S. Mann and Barry Honig, Nucleic Acids Research, 2003, Vol. 31, No. 13
3589-3592