* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download annotation and analysis of newly discovered mycobacteriophage
Genealogical DNA test wikipedia , lookup
Gene desert wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Molecular cloning wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Point mutation wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Ridge (biology) wikipedia , lookup
Copy-number variation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenomics wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Genetic engineering wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Oncogenomics wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Public health genomics wikipedia , lookup
Genome (book) wikipedia , lookup
Transposable element wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microsatellite wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Metagenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Minimal genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED MYCOBACTERIOPHAGE GENOMES Carla De Los Santos, David Homan, Jose Morales, Erica Shepard, Yu-Chen Hwang, Janine Ilagan, John-Paul Donohue, Patricia Chan, Todd Lowe, Grant Hartzog Abstract Viruses that infect bacteria (bacteriophage) are the most abundant and genetically diverse DNA-containing entities on the planet. Analysis of phage genomes may reveal novel DNA sequences, novel protein domains and provide insights into the biology of the host. We are analyzing two novel mycobacteriophage, Firecracker and Dori, which were isolated on the UCSC campus using Mycobacterium Smegmatis as the viral host. After multiple rounds of plaque purification, we performed electron microscopy and observed that Dori has a typical siphoviral morphology and that Firecracker has an unusual cylindrical morphology. The Dori and Firecracker genomes were sequenced using a combination of next-generation technologies. Following assembly of the sequence data for Dori, we obtained a single large contig of 64,613 basepairs. The Firecracker genome is 71,341 basepairs, has defined ends with 4 basepair 3’ overhangs and has a large number of short sequence repeats. Using the gene prediction programs Glimmer, GenMark, tRNAscan-SE and Aragorn we identified 93 protein-encoding genes in the Dori genome and 126 genes in the Firecracker genome. Although many mycobacteriophage genomes include tRNA genes, neither the Dori nor Firecracker genomes appear to carry structural RNA genes. BLAST searches indicate that phage Dori's genome sequence is distinct from that of previously sequenced mycobacteriophage genomes. Firecracker is very similar to a previously identified phage, Corndog, and together they define a new class of mycobacteriophage. We have also determined that Dori is a temperate phage: we have isolated Dori lysogens, identified repressor and 1 integrase genes in the Dori genome and identified and verified attP and attB sites used by Dori. Introduction The Mycobacteriophage Revolution Bacteriophage played a central role in the early development of molecular biology. For the past 30 years however, they have been largely ignored in favor of other model systems. Recently bacteriophage research has once again become an area of productive investigation. Phage are recognized as the most abundant and genetically diverse self-replicating organism on the planet; with the rise of antibiotic resistant strains of bacteria, their potential as antibacterial agents is receiving serious attention; they are also now recognized as providing a particularly productive platform for science education. The National Genomics Research Initiative, sponsored by the Howard Hughes Medical Institute is sponsoring a phage-based research initiative for undergraduates at a diverse set of universities around the country. Its goals are to increase the quality of science education and to increase the recruitment and retention of students into the sciences. The first project sponsored by the NGRI is a phage-hunting course developed by Graham Hatfull at the University of Pittsburg. In this class, students isolate novel bacteriophages using Mycobacterium Smegmatis as a host. They purify and characterize their phage, and then one or more is sequenced by the class. During the course of this project, many mycobacteriophage genomes have been characterized and added to existing databases; this success has driven the production of a mycobacteriophage-specific database that contains all the mycobacteriophage genomes 2 identified up to date, including many, as yet, genomic sequences. The current mycobacteriophage genomes have been grouped into clusters using the a program called Splitstree, which compares gene phamilies found in phages as well as possible alternative phylogenetic relationships between them.1 There has been an increase in the size of existing clusters, from the original clusters A-F to A-O, as well as the discovery of 9 new phages that do not belong to any of those clusters, the latter phages have been placed in new clusters and are considered singletons within their new clusters. As a result of the information now available on mycobacteriophage and technological scientific advances, new ways of using mycobacteriophage as research tools have also emerged, such as BRED technology proposed by the Hatfull group. Bacteriophage Recombineering of Electroporated DNA (BRED) is a technique that uses the recombineering system in bacteriophages that express Rec E and Rec T homologs. This technology has been proven to be useful in the construction of unmarked deletions for both essential and non-essential genes, in-frame internal deletions, point and nonsense mutations, gene tags, and specific insertions of genes from other organisms.2 Mycobacteriophage Background Bacteriophages are viruses that infect bacteria. They are the most abundant DNA containing entities on the planet and are a major source of genetic diversity. Bacteriophages can infect their host and be propagated in two different ways, through a lysogenic or a lytic life cycle. During the lytic cycle, the mechanism of the bacteriophage is to attach to a host cell and insert their DNA. The virus then uses the host replicating machinery to replicate its genome and produce several new phages, eventually bursting and killing its host. 3 During the lysogenic life cycle, the bacteriophage enters the host and integrates itself into the genome through recombination. In order for recombination to occur, the phage must have an attachment site, or attP site, located near the integrase gene, which catalyzes the event, in order to recognize the attachment site of the bacterial host, attB site. Once recombination has occurred and the phage genome has been integrated in the host genome, the phage remains dormant until the infected host is perturbed triggering the virus to reenter the lytic cycle, this process is known as induction. During induction, the phage removes itself from the host genome through a process called excision, and enters the lytic cycle. Mycobacteriophage Dori & Mycobacteriophage Firecracker For almost two years, we have focused on characterizing the novel mycobacteriophage genomes of Dori and Firecracker. The phage samples were collected from the UCSC campus by students taking the NGRI-sponsored course, Bio21L, Environmental Phage Genomics, which was taught by Professors Grant Hartzog and Manny Ares. Dori was originally isolated and named by Ericka Shepard and Firecracker was isolated and named by Jose Morales and David Homan. We have examined both genomes through a combination of microbiology, bioinformatics, computation biology, and next generation sequencing technologies. In this thesis, I describe the sequencing, assembly annotation and bioinformatic analyses of these phage. Results 4 Electron Microscopy of Phage Dori A high-titer plate lysate of phage Dori and Firecracker were spotted onto a carbon-coated copper grid stained with uranyl acetate and visualized by electron microscopy. This revealed that Dori has a siphovoridae morphology with a long flexible tail (fig.1) Obtaining DNA of Bacteriophage Dori In order to obtain the bacteriophage DNA necessary for sequencing, we extracted DNA from a previously prepared high-titer plate lysate of phage Dori. This yielded DNA at 63.4 ng/ul, which was subsequently sequenced at UCSC, using both 454 and SOLiD technologies. Sequencing Analysis of Bacteriophage Dori The 454 and SOLiD sequencing data were assembled separately using the Newbler and Minimus sequence assemblers respectively. These assembled sequences were also reassembled using Minimus. This analysis yielded a single contig, which we viewed using the program Hawkeye. We observed a few nucleotide uncertainties in the 454 sequence data and a pile up of reads around the 21 K region of the genome (fig.2). The 454 sequence was then compared to the SOLiD sequence, which was viewed by a program called Tablet, both sequences agreed and the region of the sequence containing the pile up was identical for both sequence outputs. The region containing the 21 K Pile up was sequenced using Sanger Chemistry at the Berkeley Sequencing Center, the results agreed with both the 454 and SOLiD sequences. Further analysis of this region was performed using Gepard to look for possible repeats in the genome, however, there were no possible repeats present. Both Sanger Chemistry and Gepard showed no indication of 5 repeats causing the 21 K pile up observed in the Dori genome. One potential explanation for this unusual over-representation of sequence reads in our data is that they resulted from preferential PCR amplification of the genomic DNA during preparation of the samples for 454 sequencing. Once sequence ambiguities were resolved, a final FASTA file of the complete Dori genome was generated. Annotating the Dori Genome We used the DNA master software package and its associated gene prediction programs, Glimmer and Genemark, to annotate protein-coding genes in phage Dori. Starting with the gene prediction generated by Glimmer and Genemark, we examined the protein encoding capacity of the Dori genome. We identified open reading frames in both positive and negative strands of the genome and annotated a total of 93 proteincoding genes in phage Dori. BLAST was used to determine whether Dori was similar to any other bacteriophage currently sequenced and whether it belonged to any of the existing clusters of mycophages. The BLAST results did not show any similarity to the entire genomes of any other bacteriophage in the database and thus Dori could not be clustered with any of the existing groups of phages; Dori is considered a singleton in the phage database. Although no extensive similarity to other phage genomes was predicted, Dori shared homology to other hypothetical phage proteins and some of the structural genes were identified using Blastp. Many genes in the Dori genome also shared high homology to protein coding genes in bacteria. The predicted open reading frames were then further annotated by considering different possible translation start sites (typically upstream of the predicted start). Criteria used to select these alternative starts included: maximizing gene length, a 6 bias against starts that create more than a 4 base pair overlap with the adjacent gene, a strong Shine Delgarno sequence, and starts that gave better Blastp hits or alignments to related genes; the map of the Dori genome is shown in Figure 3. Identifying the Ends of the Dori Genome The assembled genome sequence for Dori did not show clearly defined ends, raising the possibilities that either we did not have a complete genome sequence for Dori or that it is a circularly permuted virus. We generated a restriction map for the Dori genome and used it to select enzymes that cut near the ends of the genome sequence in our assembly. We predicted that if the Dori genome were circular, then the resulting digest using gel electrophoresis would contain only one band, instead of multiple fragments. We used lambda phage DNA cut with Bst EII as both a marker and to identify possible cohesive ends in the Dori genomes. Lambda phage contains compatible cohesive ends that form stable hybrids at room temperature. However, these will melt when heated, separating the ends of the Bst EII digest lambda DNA into two fragments. We therefore loaded digests of both lambda phage DNA digested with Bst EII and Dori digested with various enzymes with or without a prior heating step. When we digested Dori with Eco RI, we observed an extra, unexpected band at the 8kb region, which would instead result in a 25 kb band instead of a 32.6 kb band in our Eco RI digest (fig.4). Restriction digests using enzymes Sac I, Kpn I, Bcl I, Xma I, and Sma I were performed yielding similar results with extra and missing bands. The occurrence of extra and missing bands could indicate that Dori uses headful packaging as its terminating technique and since the digest do not always agree with those predicted 7 for a circular genome, these results also imply that Dori is a circularly permutated genome4. We used Sanger sequencing on the possible ends of the Dori genome and the results showed that there were indeed no defined ends in our genome and making it circularly permutated, but in order to further verify that our predictions are true, southern blot analysis is currently in progress. The att Site in Bacteriophage Dori During annotation of the Dori genome, we observed that one of the predicted protein coding genes was homologous to a bacteriophage integrase, this was identified though a Blastp search. We then performed a Blastn search using the DNA sequence of the integrase gene along with sequence around the gene and obtained. A match to M. Smegmatis at a tyrosine tRNA site was obtained and generated a figure aligning both the query sequence with the match and highlighted the perfect matches and underlined the tRNA portion of M. Smegmatis (fig. 5). Using the figure generated and length of the sequence, we predicted the lengths of the fragments that would result in a recombination event. The following results were predicted for the amplification of the recombination fragments and the original fragment in both bacteria and bacteriophage before recombination: Primer set Length of Fragment in base pairs B1→B2 569 P1→P2 314 P1→B2 491 B1→P2 392 8 In order to be able to amplify the predicted recombination fragments as well as the bacterial and phage fragments, obtained genomic DNA from Dori and M. Smegmatis. Cultures of M. Smegmatis were grown and a DNA prep Kit was used in order to obtain genomic DNA. We also prepared DNA from a Dori lysogen of M. Smegmatis. The samples were then used with the combination of primers, pure bacteria and phage primers for pure samples and combination of phage and bacteria primers for the infected samples. Gel analysis showed the bands present around the predicted sizes, indicating that this is the att site for bacteriophage Dori. Annotating the Firecracker Genome The firecracker genome was sequenced using 454 and primer-directed Sanger sequencing to finish the sequencing. The genome was assembled and then annotated as described above for Dori. 126 protein-coding genes were predicted. No tRNA genes were observed. The genomic sequence for firecracker was very similar to the genome of phage Corndog. There were a few genes that were identified based on homology to proteins found in Blastp searches. These included the phage tail protein and a terminase large subunit protein. Other genes with predicted functions are noted in figure 6. Genome Browser of Phages Dori and Firecracker Genome Browsers were produced for both Dori and Firecracker by Patricia Chan in the Lowe Lab and by John-Paul Donohue in the Ares Lab (Mycobacteriophage Dori: http://microbes.ucsc.edu/cgibin/hgTracks?org=Mycobacterium+phage+Dori&position=c hr:10001-35000 Mycobacteriophage Firecracker: http://microbes.ucsc.edu/cgibin/hgTracks?org=Mycobacterium+phage+Firecracker&posi tion=chr:10001-35000). 9 Simple Sequence Repeats in the Firecracker Genome We used a dotplotter program, Gepard, and repeated sequence motif finder, MEME to determine in either Dori or Firecracker contains repetitive DNA elements. We observed a highly repeated sequence in the Firecracker genome using Gepard, and using MEME we found that this repeat is a 17 base pair palindrome. The palindromic sequence contained a 3 nucleotide center region that were non palindromic that may imply a stem loop structure in the RNA of firecracker (fig. 7). The motif also overlapped with a repeat of the sequence “TGGGGGTGTTCGGTTTCCGAACAG”, that occurs 22 times in the genome. This sequence is specifically located between the end and the beginning of genes and it is found mostly on the positive strands, 16/22 times. Other repeats were identified in the 70 kb region of the firecracker genome. A square like representation of repeats was observed in the Gepard out put and some of the predicted repeats were present in the region between gene 125 and 126. Due to no sequence homology in the region between those two genes, no protein-coding gene was predicted in this region leaving a large gap between those two genes. Materials and Methods Phage Titer Assay for phage Dori We performed a phage titer assay for phage Dori in order to predict the amount of phage necessary and at what concentration would be enough to produce a web pattern. A web pattern is necessary when collecting the filtrate of pure phage so that we get the maximum amount of phage in order to get the maximum amount of DNA for sequencing and other experimentation. 10 Phage Dilutions Three different dilutions were made undiluted 10-2,10-3, and 10-4. These dilutions were created using a microcentrifuge tube and by adding 100 ul of phage buffer (PB) and 10ul of the undiluted phage filtrate to produce a 10-2 and then 10 ul were taken from this dilution and added to 100ul of PB to create a 10-4 and then again to create a 106 dilution. Culture Tube Preparation 3 culture tubes were used; one for each dilution along with 0.5 ml Mycobacterium Smegmatis. 10 ul of each dilution were added to different culture tubes and were left at room temperature undisturbed for 10 minutes. Plating Using the 3 different culture tubes, 4.5 ml of heated Top Agar was mixed in and pippetted onto an agar plate making sure that the top agar-phage mixture was distributed evenly as to produce a smooth top layer covering the entire surface area of the top agar plate. The plates were left at room temperature to solidify for 30-60 minutes and then placed in a 37-degree incubator upside down for a 24-hour incubation. Plaque Ttiter Assay The individual plaques present on the plates containing the dilutions of phage Dori were measured using a ruler and the following equation was used: Plaques needed for web patter = (Area of plate)/ (Area of Plaque) Since only the 10-2 dilution showed any plaque formation, this was the sample used to predict the amount necessary for a web pattern. The area of the plate was calculated to be 6082.12 mm2 and the area of the plaque was calculated to be 0.79 mm2. 11 Isolation of Genomic DNA To purify DNA from phage Dori, a high-titer phage lysate was treated with RNAse and DNAseI. The nuclease-treated phage were precipitated with polyethylene glycol and phage DNA was purified using a Promega Wizard DNA prep kit according to the manufacturer’s directions. The concentration of the purified DNA was measured using a nanodrop device, which resulted in 63.4ng/ul of Dori DNA. Ligation and PCR reactions were performed using commercially obtained enzymes following the manufacturer’s directions. Verification of Bacteriophage Terminating Technique To verify the suggested terminating technique used by Dori, we performed a series of restriction digests. For each restriction digest we picked the restriction enzymes based on the predicted size of the fragments using restriction map producing software. We aimed to obtain end fragments that were 2-7 kb long so that the resulting fragments could be easily identified. EcoR V & Bgl II Restriction Digest A restriction digest was performed using the EcoR V and Bgl II restriction enzymes and their appropriate buffers to identify possible cohesive ends in Dori genome. Bst EII/ was used as a control because it contains cohesive (cos) ends, which can be distinguished when heated and not heated, and to use as a marker for measurements. The samples were ran using a 6% agarose gel containing EtBr. The samples were observed at a half run and a full run. All samples were run heated and not heated. EcoR I, Sac I, & Sph I Restriction Digest 12 A restriction digest was performed in order to explain the heavier bands present on the previous restriction digest with Bgl II and EcoR V. Restriction enzymes EcoR I, Sac I, Sph I, and Bst EII/ were used, this time without adding heat to any of the samples. The samples were run using a 6% agarose gel containing EtBr, EtBr staining made the lower half of the gel difficult to see. From the top half of the gel, we noticed EcoR I yielded an unpredicted band around the 8kb region. Kpn I & Bgl II Restriction Digest Restriction digest using Kpn I, Bgl II, and Bst EII/ were run with their appropriate buffers using a 6% agarose gel containing EtBr. A doublet and a singlet band around the 8kb region not predicted by the restriction map for Kpn I, while the 2.7 kb end fragment was not present in the gel. Bgl II also yielded an extra band around the 13 kb region that was unaccounted for. Isolation Genomic DNA for M. Smegmatis and M.Smegmatis infected with Dori M.Smegmatis alone and infected M.Smegmatis DNA were isolated using Qiagen DNeasy Tissue Kit and following the protocol provided by the manufacturer for grampositive bacteria. Amplification of Dori att Site A blast search using the predicted tyrosine integrase gene in the Dori genome was used to search for the integration site in its host M. Smegmatis. Microsoft word was used to align the resulting match and identify the areas of similarity. The resulting information was used to calculate the possible sizes of the recombination fragments of M.Smegmatis infected by Dori. Four primers were used in PCR analysis, two for the bacterial genome and two for the mycobacteriophage genome. These primers were then arranged into four 13 possible combinations in order to amplify the att site for M.Smegmatis alone, a pure bacteriophage sample, and a sample that contained infected M.Smegmatis. Six 20l reactions were performed using a control with no template and a sample with infected M.Smeg and bacterial primers only with the specific settings required for the Mango Mix containing enzyme. The results were run on a 1% TA+ EtBr gel using a 100 bp marker Bst EII/. Discussion Since the discovery and characterization of bacteriophage lambda, our understanding of bacteriophage mechanisms and life cycles have enabled researchers to engineer tools based on phages for use in biological research and have provided new insights into bacterial evolution, genetics and physiology. Upon annotating the genome of phage Dori, we discovered that the Dori genome contains many genes with bacterial homologs. We hypothesize that these were acquired through horizontal gene transfer. In order to explain the importance of these genes in our evolved phage Dori, future experimentation involving mutagenesis or gene knock would be necessary. The Rec Elike gene found in Dori could imply that there could be a REC T-like gene in the genome that is not currently annotated in the BLAST database, with this in mind we could be able to use Dori for BRED if either the a REC T- like protein is present or if other factors are enough for a REC E-like only recombineering system. The Firecracker genome annotation did not reveal many bacterial gene homologs, however, the genome shared almost complete similarity to that of phage Corndog. Both genomes now make up the cluster O of the mycophages. The Firecracker genome also 14 contains a large number of sequence repeats, which we analyzed using Gepard and MEME. One repeat occurs throughout the Firecracker genome, and its orientation corresponds to the orientation of the underlying genes. This observation is reminiscent to that of “stoperators”, repressor-binding sites that occur in a gene-specific orientation throughout the genomes of Bxb1, L5 and other cluster A phages. Curiously, we have net yet identified a protein with a DNA binding domain in the Firecracker genome. Further analysis will be directed at identifying DNA (or RNA) binding proteins that may recognize this sequence. A sequence set of repeats in the Firecracker genome is restricted to the 3’ end of the genome. The potential function or origin of these repeats is obscure and will need further analysis. Finally, our initial analysis indicates that the Firecracker repeats are found in Corndog. The relationships between the repeat structures of these phages deserves a more careful analysis. Although there has been an increase in research focusing on bacteriophage genomics, the rapid evolutionary rate and the abundance of bacteriophage continues to make this area of research a frontier. Figures 15 Fig. 1: Electron micrographs of phages Dori and Firecracker. (Left) Dori shows an icoshedral head and long, flexible tail. (Right) Firecracker shows a cylindrical head and long tail (scale bar=100 nanometers) Fig. 2: Sequence coverage of assembled 454 sequencing reads of phage Dori. 454 sequencing results were viewed using Hawkeye viewer from the AMOS suit. Although the average read coverage is only ~20x the region around 21K shows ~200 fold coverage. 16 Fig. 3: Annotated genome of Bacteriophage Dori. The genes labeled in green are transcribed from left to right; genes in red are transcribed in the opposite direction. Genes with homologs or domains of known function are labeled. Annotated genome of phage Dori, the presence of integrase and phage antirepressor genes suggest Dori may be able to form lysogens. Fig. 4: Restriction Digest of Bacteriophage Dori. In order to find the ends of the Dori genome, whether they were cohesive or not, and verify the phage’s packaging technique we performed a restriction digest using the following enzymes heated and non heated (each enzyme corresponds to the lanes in the order from right to left): Bst EII/ heated, Bst EII/, Eco RV heated, Eco RV, Bgl II heated, and Bgl II. 17 18 Fig. 5: Predicted att site in Dori genome. Highlighted region represents the similarities between Dori and M. Smegmatis at the predicted att site. The att site has been confirmed by PCR analysis. 19 Fig. 6: Annotated genome of Bacteriophage Firecracker. The genes labeled in green are transcribed from left to right; genes in red are transcribed in the opposite direction. Genes with homologs or domains of known function are labeled. Annotated genome of phage Firecracker, genome has high similarity to phage Corndog. Figure 7: Predicted palindrome motif for Firecracker. Motif found by MEME sequence analysis of the firecracker genomic sequence. The motif occurs more than 30 times in the Firecracker genome. 20 References 1. Hatfull, G. F., et al. 2006. Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet. 2: e92. 2. Marinelli LJ, Hatfull GF, et all. 2008. BRED: a simple and powerful tool for constructing mutant and recombinant bacteriophage genomes. PLoS ONE. 3:e3957 3. Summer, E. J. 2009. Preparation of a phage DNA fragment library for whole genome shotgun sequencing. Methods Mol. Biol. 502:27-46 4. Casjens SR, Gilcrease EB. 2009. Determining DNA packaging strategy by analysis of the termini of the chromosomes in tailed-bacteriophage virions. Methods Mol. Biol. 502:91-111 5. Käser M, et all. 2009. Optimized method for preparation of DNA from pathogenic and environmental mycobacteria. Appl Environ Microbiol. 75:414418 6. Hatfull GF, et all. 2010. Comparative genomic analysis of 60 Mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol. 397:119-43 7. Pope WH, et all. 2011. Expanding the diversity of mycobacteriophages: insights into genome architecture and evolution. PLoS One. 6:e16329 21