* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Arabidopsis
Metagenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Oncogenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene expression programming wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Copy-number variation wikipedia , lookup
Transposable element wikipedia , lookup
Public health genomics wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genomic library wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Pathogenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Minimal genome wikipedia , lookup
Genome editing wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome evolution wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Percentage of genes families with >5 genes are more common in plants than in animals 100.0 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 Human Yeast Fruit fly Nematode Rice Arabidopsis 1 2 3-5 >5 Number of genes per family adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65 alternative splicing (AS) is more common in animals than in plants Arabidopsis and rice AS Boue S, et al. 2003. BioEssays 25: 1031-1034; Iida K, et al. 2004. Nucleic Acids Res 32: 5096-5103; Kikuchi S, et al. 2003. Science 301: 376-379 duplications occur on any length scale, from individual genes (where tandem refers to a gene and its duplicate being adjacent), to multi-gene segments of the chromosome, to an entire genome e.g. wild wheat is diploid 2n, domestication gave a tetraploid 4n (pasta) and a hexaploid 6n (bread) synteny is when 2 or more genes are found in the same order/orientation on the chromosomes of related species dicot monocot polyploidy (whole genome duplication) events among plants adapted from Blanc G, Wolfe KH. 2004. Plant Cell 16: 1667-1678; Paterson AH, et al. 2004. Proc Natl Acad Sci USA 101: 9903-9908 phylogeny of the favored plants there is extensive synteny among Gramineae but between Gramineae and Arabidopsis there is essentially no synteny Gramineae 55~70 Mya sorghum maize barley wheat rice Arabidopsis monocot-dicot 170~235 Mya the duplication history of rice every cDNA-defined gene is assigned a duplication category using the methods of Yu J, et al. 2005. PLoS Biol 3: e38 1. analysis relies entirely on 19,079 full length cDNAs; had we used predicted genes instead many of the duplications would have been missed 2. a homolog pair refers to a cDNA and its TblastN match (i.e. comparisons done at amino acid level to genome translation in all 6 reading frames) at an expectation value of 1E-7 and requiring that >50% be aligned; note that the TblastN match is not necessarily expressed itself 3. if a gene has any homologs at all, the mean(median) number of homologs is 40(5) 4. multiple duplications are difficult to analyze; so consider the cDNAs with 1-and-only-1 homolog ONE whole genome duplication, a recent segmental duplication, and many individual gene duplications whole genome birth recent segmental individual genes death time 18 pairs of duplicated segments covering 65.7% of rice genome higher order homologs used to backfill established trend lines Rice-Rice Comparison 40 segmental 30 20 10 0 0 10 20 30 Rice Chr02 (Mb) Rice Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 Chr10 Chr11 Chr12 ancient whole genome duplication (WGD) in rice uninterpretable plot if use cDNAs with more than one homolog in rice mean (median) number of homologs per duplicated gene is 40 (5) Rice-Rice Comparison 40 30 20 10 0 0 10 20 30 Rice Chr02 (Mb) Rice Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 Chr10 Chr11 Chr12 unmarked trend along diagonal from tandem gene duplications there were NO segmental duplications within a chromosome Rice-Rice Comparison 40 background 30 20 tandem 10 0 0 10 20 30 40 Rice Chr01 (Mb) Rice Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 Chr07 Chr08 Chr09 Chr10 Chr11 Chr12 computing molecular clocks and indicators of evolutionary selection Ka = non-synonymous changes per available site Ks = synonymous changes per available site available site corrects for fact that 76% of substitutions, or 438 of 3364, encode a different amino acid Ka/Ks < 1 is evidence of purifying selection Ka/Ks = 1 is evidence of no selection (pseudogene) Ka/Ks > 1 is evidence of adaptive selection mean Ka/Ks is 0.20 in primates and 0.14 in rodents from neutral substitution rate to time since divergence of species Kumar S, Hedges SB. 1998. Nature 392: 917-920 common ancestor species1 species2 time since divergence equals species2-species1 divided by (2 × neutral substitution rate) neutral substitution rates vary with genes and evolutionary lineages but on average they are 2.2×10-9 for mammals and 6.5×10-9 for Gramineae 17 of 18 segments are attributable to a whole genome duplication just before the Gramineae divergence 90 Rice-Rice segmental duplication higher order homologs Ks from K-Estimator 400 Rice-Rice tandem duplication two TblastN hits are allowed Ks from K-Estimator 300 60 200 30 0 0 100 0.5 1 subs per silent site, Ks 1.5 0 0 0.2 0.4 subs per silent site, Ks 0.6 timing of WGD relative to Gramineae divergence is based on observed syntenies and not Ks background duplications have Ks signature like tandem duplications except that they are more ancient 400 Rice-Rice tandem duplication two TblastN hits are allowed Ks from K-Estimator Rice-Rice background duplication one and only one homolog 200 Ks from K-Estimator 300 150 200 100 100 50 0 0 0.2 0.4 subs per silent site, Ks 0.6 0 0 1 2 subs per silent site, Ks 3 peak at zero Ks and exponential decay thereafter is indicative of ongoing duplication process duplicated genes undergo periods of relaxed selection and are usually silenced within 4~17 million years one copy left alone progenitor gene one copy to modify post-duplicative ‘transient’ of duration 4~17 million years reduced expression novel function relaxed selection eventual death hypothesis introduced by Lynch M, Conery JS. 2000. Science 290: 1151; with details in Lynch M, Conery JS. 2003. J Struct Funct Genomics 3: 35 rice analysis succeeded only because duplication is not too old when the duplication is old: an analysis from yeast comparing related genomes with and without the duplication Kellis M, et al. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617-624 when the duplication is extremely new: an analysis from human Bailey JA, et al. 2002. Recent segmental duplications in the human genome. Science 297: 1003-1007 proof of whole genome duplication in Saccharomyces cerevisiae by comparison to sequence of Kluyveromyces waltii duplication mutation gene death interleaving genes from sister segments in comparison to K. waltii gene and regional correspondences with K. waltii ancient whole genome duplication in S. cerevisiae identifying recent segmental duplications in human assembly whole genome shotgun (WGS) reads from Celera are aligned to map-based genome from IHGSC; recent segmental duplications are detected in similarity and read depth anomalies patterns of intra-chromosomal and inter-chromosomal duplication recent segmental duplications of length>10-kb & identity>95%; intra-chromosomal (blue lines) and inter-chromosomal (red bars) duplication; unique regions surrounded by intra-chromosomal duplications (gold bars) are hot spots for genomic disorders recent segmental duplications in IHGSC and Celera genomes proportion of Celera aligned bases falls rapidly as identity exceeds 97% or length exceeds 15-kb, but the total sequence lost is still only 2%~3% NB: search of the map-based rice genome revealed no segmental duplications of recent origins (Yu J, et al. 2006. Trends Plant Sci 11: 387-391 “Although it is clear that the detailed clone-ordered approach is superior in the resolution of segmental duplications, it would be unrealistic to propose that the sequencing community should abandon wholegenome-shotgun based approaches. These are the most efficient cost-effective means of capturing the bulk of the euchromatic sequence.” Evan E. Eichler (21 October 2004)