* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Ti (ID) - Educational Assistance
Western blot wikipedia , lookup
Interactome wikipedia , lookup
Biosynthesis wikipedia , lookup
Gene regulatory network wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Genomic library wikipedia , lookup
Genetic code wikipedia , lookup
Expression vector wikipedia , lookup
Messenger RNA wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Epitranscriptome wikipedia , lookup
Proteolysis wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Protein structure prediction wikipedia , lookup
Community fingerprinting wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Non-coding DNA wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Point mutation wikipedia , lookup
Third report. G. Pierron. January 2006. A. List of Physarum known sequences found in the pilot assay. B. “Top Ten Dictyo Hits”. Highest scores obtained by comparing translated Physarum traces to the Dictyo genome. A. Here is a list of “traces” that correspond to previously known sequences of Physarum. There are of two types: either a definite sequence is being re-sequenced and a close to 100% homology at the nucleotide level is observed or it is another member of a gene family that is sequenced (weak homology at the nucleotide level) but showing sufficient amino acid similarity to be recognized. The list is probably not exhaustive but should be close to it. Most of these hits were found by comparing translated traces (6 frames), 500 at a time, against all the known proteins (nr bank). This comparison, which required to work as “blast client” was kindly performed by Michel Kress in Villejuif as I was not able to do it myself. Some pretty long sequences like the 4.6-kb topoisomerase II mRNA are absent from the traces while some other mRNAs are really showing up, the winner being the 1,488 bp coronin mRNA with 5 overlaps including a pair of “mates” (in blue). These 5 traces reveal at least 5 introns in the genomic DNA sequence of the coronin gene. Theoretically, it is possible to amplify the DNA located between 2 mates and therefore to have access to much more information on these loci by PCR. Finally, a trace is not always 100% identical to the known sequence and this is because the readout of the trace beyond 750-800 bp is less accurate. You have access to the real data in the “trace archive window” by selecting “show – trace - in color”, showing the peaks in color for the selected sequence and the interpretation given by automated lecture. There is even a score of quality for each nucleotide and obviously some sequences are difficult to interpret. This is the case for the 818412910 seq. covering the exon 6 of the GTP hydrolase gene of Ernst Werner et al, probably producing error mutations rather than the strain variability invoked by Ernst in his Dec. 20 message on this Web site. As far as we know, the LU352 amoebal DNA sequenced in the pilot assay is similar to the Wisconsin 1 isolate DNA. Ti (ID) Physarum Genes or mRNA sequences. 818 41 8298 Coronin 100% identity nt 818 41 9235 “ 97% nt 820 21 6828 “ 100% nt 820 21 7159 “ 100% nt 818 42 3370 “ 94% nt 818 421 623 redB 100% (nt) 818 41 1404 Myosin-related protein CAA79924 100% identity (nt) 818 42 1477 Transglutaminase 98% (nt) 818 42 3140 “mate“ 818 42 4773 SRPK-like protein 100% (nt) “mate” 820 222 302 36% aa (!!) 818 42 4683 Gamma tubulin 100% nt 820 216 6565 Histone H1 100% nt 818 40 8752 CCT chaperonin alpha subunit 99% nt, 3’end 818 41 9688 Ubiquitin2 mRNA 100% nt 820 21 9667 beta-glucosidase 100% nt 820 21 8718 GTP cyclohydrolase I 100% nt 820 21 7785 “ 818 41 2910 “ 818 41 6011 “ 818 41 6499 Glutathione reductase promoteur 100% nt 818 41 3213 fragmin 60 100% nt 818 40 9685 Spherulin 2b 97% nt 818 40 9569 Chaperonin containing TCP-1 epsilon 93% nt 820 22 1730 Actin-fragmin kinase 90% nt 820 21 8418 hap1 83% nt promoteur Gene family members. 8202 22 040 ras1 protein 50% aa 818 41 5428 Spherulin 1B 4 repeats (nt) encoding the same 29 aa !! 818 42 3760 Precursor Spherulin 4 50% aa, not seen as nt 818 41 6174 Physarolisin 50% identity aa 818 41 0861 Physarolisin 67% identity aa 820 22 2615 Major plasmodial myosin heavy chain 820 22 2025 “mate” 820 22 0578 Major plasmodial myosin heavy chain 818 41 9422 “mate” 50% aa 40% aa Mitochondrial DNA. (some of the hits) 818 41 8104 Cytochrome oxydase subunit 100% identity nt but 44% identity aa. Editing 818 41 7852 NADH dehydrogenase subunit 4L “ 820 22 0967 818 42 0382 Apocytochrome b 818 41 1999 “ 818 42 3493 “ 820 22 1483 PhpooMp19 820 21 8035 H+ two-sector ATPase 820 21 7868 ATP synthase subunit 8 98% nt mtDNA 99% nt mtDNA B. Comparing the Physarum traces translated into 6 phases to all the known proteins was somewhat complicated. Comparing the same sequences to a much smaller protein sample was easier (see our progress report 1), but still taking a lot of time. I chose to compare the Physarum sequences to the Dictyostelium proteome, as derived from the complete genome sequence of this organism. I obtained several hundred hits in the form of stretches of similar amino acids, often discontinuous, due to the presence of introns within the Physarum sequences. Here I provide a list of the “Top Ten Dictyo Hits”, not necessarily the most interesting proteins, but the highest homology scores. Ti (ID) Score E value Dictyo/Protein Comments 818424683 322 9e-89 -tubulin Seq. within one of the longest known Physarum exon > 600bp. Homology not broken by introns = high score. 818408914 301 2e-82 hypothetical 2 oxoglutarate dehydrogenase-like of Arabidopsis 820221031 264 4e-71 hypothetical Dictyo best homolog, coatomer alpha- subunitlike, Golgi non-clathrin-coated vesicles. 818419206 259 8e-70 hypothetical similar to ubiquitin specific protease 820220949 253 6e-58 hypothetical similar to adaptor protein complex AP-2 818414303 232 9e-61 hypothetical similar to vacuolar protein sorting… 818414775 225 2e-59 Kinesin 1 49% identity with Dictyo, 65% with mammals 820219459 224 2e-59 RNA polII core subunit, Dictyo is best homolog. 820218555 207 2e-54 hypothetical coatomer, Homo sapiens best homolog 818409447 202 1e-52 hypothetical Dictyo best homolog, putative cleavage and poly-A specificity factor These scores are all highly significant but should be taken as relative values. Hence, the homology of score number 10 is broken in two pieces because of an intron. Removing this intron in silico and blasting the resulting sequence results in a slightly higher score. Finally, it is obvious that the Dictyo sequence will be very helpful in annotating the Physarum sequence. This is illustrated below with the trace 82020949 (6th score above) which contains part of a gene coding for a putative vacuolar protein-sorting-associated protein. Alignment of the translated sequence with the best homolog (Dictyo hypothetical protein XP629242) reveals a gap underlining an intron. See below. Now, taking the Physarum translated DNA sequence, it is easy to find intron/exon junctions, starting to make sense of this unknown DNA trace. Query = 82020949, 892 bp. Next and may be last report will be a list of traces containing sequences of genes of interest like RNA polymerases, PI-3 kinase, myosins….