Download Ti (ID) - Educational Assistance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Western blot wikipedia , lookup

Interactome wikipedia , lookup

Biosynthesis wikipedia , lookup

Gene regulatory network wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Genomic library wikipedia , lookup

Genetic code wikipedia , lookup

Expression vector wikipedia , lookup

Messenger RNA wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Epitranscriptome wikipedia , lookup

Proteolysis wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Protein structure prediction wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Non-coding DNA wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Point mutation wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Third report.
G. Pierron.
January 2006.
A. List of Physarum known sequences found in the pilot assay.
B. “Top Ten Dictyo Hits”. Highest scores obtained by comparing translated Physarum
traces to the Dictyo genome.
A. Here is a list of “traces” that correspond to previously known sequences of
Physarum. There are of two types: either a definite sequence is being re-sequenced and a
close to 100% homology at the nucleotide level is observed or it is another member of a gene
family that is sequenced (weak homology at the nucleotide level) but showing sufficient
amino acid similarity to be recognized.
The list is probably not exhaustive but should be close to it. Most of these hits were
found by comparing translated traces (6 frames), 500 at a time, against all the known proteins
(nr bank). This comparison, which required to work as “blast client” was kindly performed by
Michel Kress in Villejuif as I was not able to do it myself.
Some pretty long sequences like the 4.6-kb topoisomerase II mRNA are absent from
the traces while some other mRNAs are really showing up, the winner being the 1,488 bp
coronin mRNA with 5 overlaps including a pair of “mates” (in blue). These 5 traces reveal at
least 5 introns in the genomic DNA sequence of the coronin gene. Theoretically, it is possible
to amplify the DNA located between 2 mates and therefore to have access to much more
information on these loci by PCR. Finally, a trace is not always 100% identical to the known
sequence and this is because the readout of the trace beyond 750-800 bp is less accurate. You
have access to the real data in the “trace archive window” by selecting “show – trace - in
color”, showing the peaks in color for the selected sequence and the interpretation given by
automated lecture. There is even a score of quality for each nucleotide and obviously some
sequences are difficult to interpret. This is the case for the 818412910 seq. covering the exon
6 of the GTP hydrolase gene of Ernst Werner et al, probably producing error mutations rather
than the strain variability invoked by Ernst in his Dec. 20 message on this Web site. As far as
we know, the LU352 amoebal DNA sequenced in the pilot assay is similar to the Wisconsin 1
isolate DNA.
Ti (ID)
Physarum Genes or mRNA sequences.
818 41 8298
Coronin
100% identity nt
818 41 9235
“
97% nt
820 21 6828
“
100% nt
820 21 7159
“
100% nt
818 42 3370
“
94% nt
818 421 623
redB
100% (nt)
818 41 1404
Myosin-related protein CAA79924
100% identity (nt)
818 42 1477
Transglutaminase
98% (nt)
818 42 3140
“mate“
818 42 4773
SRPK-like protein
100% (nt)
“mate”
820 222 302
36% aa (!!)
818 42 4683
Gamma tubulin
100% nt
820 216 6565
Histone H1
100% nt
818 40 8752
CCT chaperonin alpha subunit
99% nt, 3’end
818 41 9688
Ubiquitin2 mRNA
100% nt
820 21 9667
beta-glucosidase
100% nt
820 21 8718
GTP cyclohydrolase I
100% nt
820 21 7785
“
818 41 2910
“
818 41 6011
“
818 41 6499
Glutathione reductase promoteur
100% nt
818 41 3213
fragmin 60
100% nt
818 40 9685
Spherulin 2b
97% nt
818 40 9569
Chaperonin containing TCP-1 epsilon
93% nt
820 22 1730
Actin-fragmin kinase
90% nt
820 21 8418
hap1
83% nt
promoteur
Gene family members.
8202 22 040
ras1 protein
50% aa
818 41 5428
Spherulin 1B
4 repeats (nt) encoding the same 29 aa !!
818 42 3760
Precursor Spherulin 4
50% aa, not seen as nt
818 41 6174
Physarolisin
50% identity aa
818 41 0861
Physarolisin
67% identity aa
820 22 2615
Major plasmodial myosin heavy chain
820 22 2025
“mate”
820 22 0578
Major plasmodial myosin heavy chain
818 41 9422
“mate”
50% aa
40% aa
Mitochondrial DNA. (some of the hits)
818 41 8104
Cytochrome oxydase subunit
100% identity nt but
44% identity aa. Editing
818 41 7852
NADH dehydrogenase subunit 4L
“
820 22 0967
818 42 0382
Apocytochrome b
818 41 1999
“
818 42 3493
“
820 22 1483
PhpooMp19
820 21 8035
H+ two-sector ATPase
820 21 7868
ATP synthase subunit 8
98% nt mtDNA
99% nt mtDNA
B. Comparing the Physarum traces translated into 6 phases to all the known proteins
was somewhat complicated. Comparing the same sequences to a much smaller protein sample
was easier (see our progress report 1), but still taking a lot of time. I chose to compare the
Physarum sequences to the Dictyostelium proteome, as derived from the complete genome
sequence of this organism. I obtained several hundred hits in the form of stretches of similar
amino acids, often discontinuous, due to the presence of introns within the Physarum
sequences. Here I provide a list of the “Top Ten Dictyo Hits”, not necessarily the most
interesting proteins, but the highest homology scores.
Ti (ID)
Score E value
Dictyo/Protein
Comments
818424683
322
9e-89
-tubulin
Seq. within one of the longest known
Physarum exon > 600bp. Homology not
broken by introns = high score.
818408914
301
2e-82
hypothetical
2 oxoglutarate dehydrogenase-like of Arabidopsis
820221031
264
4e-71
hypothetical
Dictyo best homolog, coatomer alpha- subunitlike, Golgi non-clathrin-coated vesicles.
818419206
259
8e-70
hypothetical
similar to ubiquitin specific protease
820220949
253
6e-58
hypothetical
similar to adaptor protein complex AP-2
818414303
232
9e-61
hypothetical
similar to vacuolar protein sorting…
818414775
225
2e-59
Kinesin 1
49% identity with Dictyo, 65% with mammals
820219459
224
2e-59
RNA polII
core subunit, Dictyo is best homolog.
820218555
207
2e-54
hypothetical
coatomer, Homo sapiens best homolog
818409447
202
1e-52
hypothetical
Dictyo best homolog, putative cleavage and
poly-A specificity factor
These scores are all highly significant but should be taken as relative values. Hence, the
homology of score number 10 is broken in two pieces because of an intron. Removing this intron in silico and
blasting the resulting sequence results in a slightly higher score. Finally, it is obvious that the Dictyo sequence
will be very helpful in annotating the Physarum sequence. This is illustrated below with the trace 82020949
(6th score above) which contains part of a gene coding for a putative vacuolar protein-sorting-associated
protein. Alignment of the translated sequence with the best homolog (Dictyo hypothetical protein XP629242)
reveals a gap underlining an intron. See below. Now, taking the Physarum translated DNA sequence, it is
easy to find intron/exon junctions, starting to make sense of this unknown DNA trace.
Query = 82020949, 892 bp.
Next and may be last report will be a list of traces containing sequences of genes of interest like
RNA polymerases, PI-3 kinase, myosins….