Download Ubiquitous Internal Gene Duplication in Eukaryotes and Intron

International Conference on Molecular Systems Biology 2009 Ubiquitous Internal Gene Duplication in Eukaryotes and Intron Creation Xiang Gao, Michael Lynch Department of Biology, Indiana University Bloomington, IN 47405, USA E-Mail: [email protected], [email protected] ___________________________________________________________________ Duplication of genomic segments provides a primary resource for the origin of evolutionary novelties. However, most previous studies have focused on duplications of complete protein-coding genes; while little is known about the significance of duplications that are entirely internal to genes (referred as internal gene duplication thereafter). We use mathematical models to study this genome evolution process quantitatively. Our comparative analysis on six fully sequenced genomes (Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Oryza sativa) reveals that internal gene duplication is a widespread steady-state birth/death process, with a high frequency similar to that of complete gene duplications (0.001 to 0.013 duplications/gene/Mys), such that 8-17% of the genes in a genome have duplicated intronic and exonic regions. More importantly, internal duplications lead directly to the creation of new architectural features, such as spliceosomal introns, a hallmark of eukaryotic gene structure. At least 7-28% of the genes that contain an internal duplication have acquired novel introns, either because a prior intron or exon has been duplicated, or more commonly, because a spatial change has activated a latent splice site. The likely mechanism to create new introns is evoking cryptic splicing sites after sequence rearrangements in internal duplication. cDNA evidence have been found to support half of those new introns. These results strongly suggest a major evolutionary role for internal gene duplications in the origin of genomic novelties, particularly as a mechanism for intron gain. International Conference on Molecular Systems Biology 2009 Genome-Wide Estimation of Nucleotide Diversity and Disequilibrium Coefficients from Single Heterozygous Diploid Genome Xiang Gao, Michael Lynch Department of Biology, Indiana University Bloomington, IN 47405, USA E-Mail: [email protected], [email protected] ___________________________________________________________________ The studies on molecular population genetics typically rely on assays of moderate numbers of individuals at a small numbers of loci, companied with high sampling variance. The high-throughput genomic sequencing methods yield unprecedented power for reliably estimating important parameters in population genetics such as nucleotide diversity and linkage disequilibrium. For random-mating populations, population-wide average nucleotide diversity can be acquired from massive numbers of largely unlinked sites from fully sequenced genomes, and the correlation of heterozygosity among linked sites can provide insight into spatial patterns of genomic disequilibrium. However, the high-throughput sequencing methods also raise two substantial challenges in sequence analysis: how to (1) account for the binomial sampling of parental alleles at low coverage nucleotide sites and to (2) eliminate bias from sequence errors. To minimize the effects of both problems, we have developed a maximal likelihood method for generating nearly unbiased and minimum-sampling-variance estimates of nucleotide heterozygosity, and the pattern of decomposition of linkage disequilibrium with physical distance. We have applied our method to infer the above parameters from the genome of Placozoa (representing a simple animal) and human (representing a complex animal). The heterozygosity at 4-fold synonymous sites, intron sequences, and intergenic regions (0.0071) showed higher nucleotide diversity than 2-fold synonymous sites (0.0049) and UTRs (0.0045). As predicted, nonsynonymous sites maintained the lowest nucleotide diversity (0.0019), and the estimated sequencing error rates from different sites remained uniform (0.0045-0.0057). In addition, the heterozygosity and sequencing error rate estimated from genome sites with different sequencing coverage (5 to 11-fold) are consistent, indicating the robustness of the ML method with respect to sequencing coverage. The linkage associations were estimated from pairs of nucleotides of all possible distances, ranging form 1 to 1000 bps. The decay of linkage association with the increase of the physical distance followed a negative exponential distribution, from which the recombination rates in different species were computed. International Conference on Molecular Systems Biology 2009 Our results demonstrate that our maximal likelihood methods can provide accurate estimation on wide array of population-genetic parameters from whole-genome analyses of single diploid organisms. Different from traditional methods in studying population genetics that rely on time consuming experiments and only limited data, our method can rapidly extrapolate information central to our understanding of evolution from massive genome sequences.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Ubiquitous Internal Gene Duplication in Eukaryotes and Intron