* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Ubiquitous Internal Gene Duplication in Eukaryotes and Intron
Oncogenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Human genetic variation wikipedia , lookup
DNA sequencing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Population genetics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression programming wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Transposable element wikipedia , lookup
Public health genomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic imprinting wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
Human genome wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Human Genome Project wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Microevolution wikipedia , lookup
Metagenomics wikipedia , lookup
Genomic library wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Pathogenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome editing wikipedia , lookup
International Conference on Molecular Systems Biology 2009 Ubiquitous Internal Gene Duplication in Eukaryotes and Intron Creation Xiang Gao, Michael Lynch Department of Biology, Indiana University Bloomington, IN 47405, USA E-Mail: [email protected], [email protected] ___________________________________________________________________ Duplication of genomic segments provides a primary resource for the origin of evolutionary novelties. However, most previous studies have focused on duplications of complete protein-coding genes; while little is known about the significance of duplications that are entirely internal to genes (referred as internal gene duplication thereafter). We use mathematical models to study this genome evolution process quantitatively. Our comparative analysis on six fully sequenced genomes (Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Oryza sativa) reveals that internal gene duplication is a widespread steady-state birth/death process, with a high frequency similar to that of complete gene duplications (0.001 to 0.013 duplications/gene/Mys), such that 8-17% of the genes in a genome have duplicated intronic and exonic regions. More importantly, internal duplications lead directly to the creation of new architectural features, such as spliceosomal introns, a hallmark of eukaryotic gene structure. At least 7-28% of the genes that contain an internal duplication have acquired novel introns, either because a prior intron or exon has been duplicated, or more commonly, because a spatial change has activated a latent splice site. The likely mechanism to create new introns is evoking cryptic splicing sites after sequence rearrangements in internal duplication. cDNA evidence have been found to support half of those new introns. These results strongly suggest a major evolutionary role for internal gene duplications in the origin of genomic novelties, particularly as a mechanism for intron gain. International Conference on Molecular Systems Biology 2009 Genome-Wide Estimation of Nucleotide Diversity and Disequilibrium Coefficients from Single Heterozygous Diploid Genome Xiang Gao, Michael Lynch Department of Biology, Indiana University Bloomington, IN 47405, USA E-Mail: [email protected], [email protected] ___________________________________________________________________ The studies on molecular population genetics typically rely on assays of moderate numbers of individuals at a small numbers of loci, companied with high sampling variance. The high-throughput genomic sequencing methods yield unprecedented power for reliably estimating important parameters in population genetics such as nucleotide diversity and linkage disequilibrium. For random-mating populations, population-wide average nucleotide diversity can be acquired from massive numbers of largely unlinked sites from fully sequenced genomes, and the correlation of heterozygosity among linked sites can provide insight into spatial patterns of genomic disequilibrium. However, the high-throughput sequencing methods also raise two substantial challenges in sequence analysis: how to (1) account for the binomial sampling of parental alleles at low coverage nucleotide sites and to (2) eliminate bias from sequence errors. To minimize the effects of both problems, we have developed a maximal likelihood method for generating nearly unbiased and minimum-sampling-variance estimates of nucleotide heterozygosity, and the pattern of decomposition of linkage disequilibrium with physical distance. We have applied our method to infer the above parameters from the genome of Placozoa (representing a simple animal) and human (representing a complex animal). The heterozygosity at 4-fold synonymous sites, intron sequences, and intergenic regions (0.0071) showed higher nucleotide diversity than 2-fold synonymous sites (0.0049) and UTRs (0.0045). As predicted, nonsynonymous sites maintained the lowest nucleotide diversity (0.0019), and the estimated sequencing error rates from different sites remained uniform (0.0045-0.0057). In addition, the heterozygosity and sequencing error rate estimated from genome sites with different sequencing coverage (5 to 11-fold) are consistent, indicating the robustness of the ML method with respect to sequencing coverage. The linkage associations were estimated from pairs of nucleotides of all possible distances, ranging form 1 to 1000 bps. The decay of linkage association with the increase of the physical distance followed a negative exponential distribution, from which the recombination rates in different species were computed. International Conference on Molecular Systems Biology 2009 Our results demonstrate that our maximal likelihood methods can provide accurate estimation on wide array of population-genetic parameters from whole-genome analyses of single diploid organisms. Different from traditional methods in studying population genetics that rely on time consuming experiments and only limited data, our method can rapidly extrapolate information central to our understanding of evolution from massive genome sequences.