* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Searching for Mobile Genetic Elements in the Genome of the
United Kingdom National DNA Database wikipedia , lookup
DNA barcoding wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Point mutation wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Public health genomics wikipedia , lookup
Minimal genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Pathogenomics wikipedia , lookup
Microsatellite wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic library wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Metagenomics wikipedia , lookup
Sequence alignment wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome evolution wikipedia , lookup
Genome editing wikipedia , lookup
Searching for Mobile Genetic Elements in the Genome of the Tasmanian Devil (Sarcophilus harrisii) German Lagunas-Robles & Peter Arensburger California State Polytechnic University, Pomona Abstract Figure 1. Charlie24_Sh BLAST run (against nr database) used to determine if possible TE likelihood of being a strong candidate. Using 1) Bedtools (a suite of programs for genome arithmetic (Quinlan and Hall 2010)) All eukaryotic genomes contain mobile DNA segments known as transposable 2) the BED formatted file with the sequence names and 3) base pair positions, two files elements (TEs) which can transpose between nonhomologous sites. Various diseases, containing the sequence of all potential class II TEs from the Tasmanian devil genome including cancers, as well as changes in genetic expressions can be associated with was created. The two files – a file with the headers as the possible TE name and its TEs – this makes their annotation in genomes important. The Tasmanian devil sequence and a second file that recorded the location in the genome and its sequence. (Sarcophilus harrissi) population is facing extinction as a result of a transmissible These two files were merged into a single FASTA formatted file using a custom-made cancer, Devil facial tumor disease (DFTD), the origin of which is still poorly understood. PERL script. The FASTA file was further parsed by retrieving the sequences that were Given the possible links between TEs and various cancers I undertook the annotation of 100 base pairs or more. There were 16 possible transposable element names from this one class of TEs (class II TEs) in the recently sequenced Tasmanian devil genome. The list which were graphed individually by name. The peaks of interest were the highest TE component of the Tasmanian devil genome was annotated denovo using a variety of peaks on each graph (2 – 3 sequences per graph). The peaks represented sequences bioinformatics tools. Using this list as well as a previously published Tasmanian devil TE pertaining to graph's element which were later compared to the non-redundant (nr) list by Gallus et al. (2015) I used the RepeatMasker program to screen the Tasmanian BLAST database as well as the Repbase sequences to rank the possibility of the devil genome for low complexity DNA sequences and various repeats, including TEs. I element being a real element obtain the possible TE name, its location in the genome, wrote custom computer analysis scripts in the Perl programming language to analyze and its corresponding. the results of this analysis. Sixteen potential transposable elements were identified and Results scored for their likelihood of being real TEs. In this poster I present a detailed description of my bioinformatics search methodology as well as a summary of the novel Table 1. Possible transposable elements rankings. Significant similarities are the elements to which they are most similar to when aligned. TE sequences I discovered. Introduction Possible Element Name Element Probability Significant Similarities (Repbase Database) rnd-3_family-432_Sh not a transposable element N/A rnd-4_family-65_Sh possible fragmented transposable element Charlie1 possible transposable element hat-1_MeU Transposable elements (TEs) make up a significant percentage of genome in all organisms. These elements are mobile and can have effects on the organism's expression of genes if allowed to transpose . When the relationship between TEs and Figure 2. Charlie24_Sh BLAST run (against Repbase database) used to determine if possible TE likelihood of being a strong candidate. Discussion After the possible transposable element sequences were compared to the Repbase the host genome is examined, TEs can sometimes act as parasites that are attempting rnd-4_family-509_Sh to replicate (increase copy number within host genome) and transpose, either within the rnd-5_family-155_Sh not a transposable element N/A genome or to a new genome altogether via a vector species (e.g. viruses,Munoz-Lopez rnd-5_family-671_Sh not a transposable element N/A rnd-5_family-1106_Sh not a transposable element N/A I ran a BLAST comparison against the non-redundant (nr) BLAST database to ensure rnd-5_family-1563_Sh possible transposable element hat-1_MeU that the sequences that were being flagged as possible sequences are not sequences rnd-6_family-59_Sh possible fragmented transposable element Blackjack that have been previously identified. The likelihood of the nr sequence from the BLAST et al. 2010). TEs jumping to the genome of a different species, is known as horizontal transfer. database, the list of sequences were still only potential transposable element sequences. To determine whether or not each sequence is truly a transposable element database containing a transposable element is something that must be considered. TEs are classified into two main groups – Class I and Class II TEs. Class I TEs (a.k.a. rnd-6_family-332_Sh not a transposable element N/A retrotransposons) can be similar to retroviruses in structure and lifestyle (Munoz-Lopez rnd-6_family-1271_Sh not a transposable element N/A rnd-6_family-1583_Sh not likely a transposable element Blackjack rnd-6_family-1913_Sh possible transposable element Cheshire-2_MD element (Munoz-Lopez et al. 2010). Previous activity by TEs can still be seen in the Charlie1b_Sh not a transposable element N/A genome with remnants of TIRs allowing classification of these sequences as potential Charlie24_Sh not likely a transposable element Charlie24 Mariner1_MD_Sh possible transposable element Mariner1_MD the best match by BLAST to this sequence. It was found that the devil genome has low Mariner3_MD_Sh possible transposable element Mariner3_MD complexity of MHC I genes. This might explain the large number of Charlie24_Sh et al. 2010). Class II TEs (DNA transposons) transpose by a cut-and-paste mechanism. A full length Class II TE will have two Terminal Inverted Repeat (TIR) flanking the fragmented TEs. The likelihood of finding a complete intact TE is slim but finding remnants of TEs has a good possibility of occurring. Methods Charlie24_Sh Sequence Lengths The objective of this research project was to identify class II transposable elements in University. RepeatModeler, a program used to identify repeat sequences within a genome, was used to obtain a list of possible DNA transposable elements sequences. The list of sequences that was obtained from RepeatModeler was compared using 1800 1600 Length of Sequence by the Center for Comparative Genomics and Bioinformatics at Pennsylvania State element is indeed a likely TE. At first glance, Charlie24_Sh exhibited a relatively low e-value when being compared to the Repbase database sequence of Charlie24. This suggested that Charlie24_Sh was a possible TE, but when run against the nr BLAST database, a MHC I gene came up as sequences therefore it might not represent a true TE sequence. 2000 the Tasmanian devil genome. The genome being used in the analysis was assembled Therefore external sources were consulted in order to make a decision on whether an 1400 1200 1000 800 Graph 1. Charlie24_Sh sequence lengths used to determine what sequences are candidates for the BLAST run. significantly similar to the Charlie1 sequence when compared to the Repbase database. When it was compared to the nr BLAST, it was similar to the human chromosome 7 and 8 (two sequences were compared). Upon further examination, one of the two chromosomes was annotated for transposable elements, one of which was 600 400 Charlie1. This qualified rnd-4_family-65_Sh as a possible transposable element. 200 BLAST, an alignment algorithm for comparing to a list of transposable element Rnd-4_family-65_Sh, a possible TE indicated by the initial RepeatModeler run, was 0 1 sequences from Repbase (a database for repetitive DNA sequences) with the addition 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101105109113117121125129133137141145149 Sequence Number Rnd-6_family-1913_Sh, a possible TE indicated by the initial RepeatModeler run, of specific transposable elements identified by Gallus et al. (2015). The output from the exhibited a relatively low e-value when compared to Repbase sequence of Cheshire- BLAST algorithm was parsed using a custom-made PERL script that identified element Rnd-4_family-65_Sh Sequence Lengths 600 Graph 2. Rnd4_family-65_Sh sequence lengths used to determine what sequences are candidates for the BLAST run. names along with their corresponding base pair positions if the e-value (a metric 0.0005. The sequence names and corresponding base pair positions were formatted into the BED format, a format that allows for flexibility when defining data lines using a custom-made PERL script. Length of Sequences implemented in BLAST for how good the alignment was) was less than or equal to 500 400 300 200 100 2_MD. When rnd-6_family-1913_SH was compared to the nr BLAST database, it was determined that a sequence in Macropus eugenii, Tammar Wallaby, matched this sequence. Previous annotation of the Tammar Wallaby genome did not indicate any TEs. There exists a strong possibility that this is in fact a TE that was passed through the marsupial linage by way of horizontal transfer. Further studies and analyses should be done to expand on these findings. One finding 0 1 39 77 115 153 191 229 267 305 343 381 419 457 495 533 571 609 647 685 723 761 799 837 875 913 951 989 1027 1065 1103 1141 1179 1217 1255 1293 1331 1369 1407 1445 1483 1521 1559 1597 1635 1673 1711 1749 of relative interest is the possibility of horizontal transfer between marsupials. Photo Credit: Bonorong Wildlife Sanctuary Sequence Number Rnd-6_family-1913_Sh Sequence Lengths 1400 Graph 2. Rnd6_family-1913_Sh sequence lengths used to determine what sequences are candidates for the BLAST run. Length of Sequences 1200 1000 800 600 400 Munoz-Lopez, M, and JL Garcia-Perez. "DNA Transposons: Nature and Applications in Genomics." Current Genomics, 11.2 (2010): 115-128. Quinlan, A. R., and I. M. Hall. 2010. “BEDTools: A Flexible Suite of Utilities for Comparing Genomic Features.” Bioinformatics 26 (6): 841–42. 200 doi:10.1093/bioinformatics/btq033. 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 337 349 361 373 385 397 409 421 433 445 457 469 481 493 505 517 529 541 553 0 Photo Credit: Amie Hindson Healsville Sanctuary References Sequence Number