Download Ubiquitous Internal Gene Duplication in Eukaryotes and Intron

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oncogenomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Human genetic variation wikipedia , lookup

NUMT wikipedia , lookup

DNA sequencing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Population genetics wikipedia , lookup

Polyploid wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression programming wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Transposable element wikipedia , lookup

Public health genomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomic imprinting wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Human genome wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Human Genome Project wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Microevolution wikipedia , lookup

Metagenomics wikipedia , lookup

Genomic library wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Pathogenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
International Conference on Molecular Systems Biology 2009
Ubiquitous Internal Gene Duplication in Eukaryotes and Intron Creation
Xiang Gao, Michael Lynch
Department of Biology, Indiana University
Bloomington, IN 47405, USA
E-Mail: [email protected], [email protected]
___________________________________________________________________
Duplication of genomic segments provides a primary resource for the origin of
evolutionary novelties. However, most previous studies have focused on
duplications of complete protein-coding genes; while little is known about the
significance of duplications that are entirely internal to genes (referred as internal
gene duplication thereafter). We use mathematical models to study this genome
evolution process quantitatively. Our comparative analysis on six fully sequenced
genomes (Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis
elegans, Arabidopsis thaliana, and Oryza sativa) reveals that internal gene
duplication is a widespread steady-state birth/death process, with a high frequency
similar to that of complete gene duplications (0.001 to 0.013
duplications/gene/Mys), such that 8-17% of the genes in a genome have duplicated
intronic and exonic regions.
More importantly, internal duplications lead directly to the creation of new
architectural features, such as spliceosomal introns, a hallmark of eukaryotic gene
structure. At least 7-28% of the genes that contain an internal duplication have
acquired novel introns, either because a prior intron or exon has been duplicated, or
more commonly, because a spatial change has activated a latent splice site. The
likely mechanism to create new introns is evoking cryptic splicing sites after
sequence rearrangements in internal duplication. cDNA evidence have been found
to support half of those new introns. These results strongly suggest a major
evolutionary role for internal gene duplications in the origin of genomic novelties,
particularly as a mechanism for intron gain.
International Conference on Molecular Systems Biology 2009
Genome-Wide Estimation of Nucleotide Diversity and Disequilibrium
Coefficients from Single Heterozygous Diploid Genome
Xiang Gao, Michael Lynch
Department of Biology, Indiana University
Bloomington, IN 47405, USA
E-Mail: [email protected], [email protected]
___________________________________________________________________
The studies on molecular population genetics typically rely on assays of moderate
numbers of individuals at a small numbers of loci, companied with high sampling
variance. The high-throughput genomic sequencing methods yield unprecedented
power for reliably estimating important parameters in population genetics such as
nucleotide diversity and linkage disequilibrium. For random-mating populations,
population-wide average nucleotide diversity can be acquired from massive
numbers of largely unlinked sites from fully sequenced genomes, and the
correlation of heterozygosity among linked sites can provide insight into spatial
patterns of genomic disequilibrium. However, the high-throughput sequencing
methods also raise two substantial challenges in sequence analysis: how to (1)
account for the binomial sampling of parental alleles at low coverage nucleotide
sites and to (2) eliminate bias from sequence errors. To minimize the effects of both
problems, we have developed a maximal likelihood method for generating nearly
unbiased and minimum-sampling-variance estimates of nucleotide heterozygosity,
and the pattern of decomposition of linkage disequilibrium with physical distance.
We have applied our method to infer the above parameters from the genome of
Placozoa (representing a simple animal) and human (representing a complex
animal). The heterozygosity at 4-fold synonymous sites, intron sequences, and
intergenic regions (0.0071) showed higher nucleotide diversity than 2-fold
synonymous sites (0.0049) and UTRs (0.0045). As predicted, nonsynonymous sites
maintained the lowest nucleotide diversity (0.0019), and the estimated sequencing
error rates from different sites remained uniform (0.0045-0.0057). In addition, the
heterozygosity and sequencing error rate estimated from genome sites with
different sequencing coverage (5 to 11-fold) are consistent, indicating the
robustness of the ML method with respect to sequencing coverage. The linkage
associations were estimated from pairs of nucleotides of all possible distances,
ranging form 1 to 1000 bps. The decay of linkage association with the increase of
the physical distance followed a negative exponential distribution, from which the
recombination rates in different species were computed.
International Conference on Molecular Systems Biology 2009
Our results demonstrate that our maximal likelihood methods can provide accurate
estimation on wide array of population-genetic parameters from whole-genome
analyses of single diploid organisms. Different from traditional methods in
studying population genetics that rely on time consuming experiments and only
limited data, our method can rapidly extrapolate information central to our
understanding of evolution from massive genome sequences.