Download ab initio - Ware Lab

Sequence and Analysis of the Maize B73 Genome Doreen Ware1,2, Joshua Stein1, Apurva Narechania1, Shiran Pasternak1, Linda McMahan1, Chengzhi Liang1, Wei Zhao1, Sharon Wei1, William Spooner1,, Ben Faga1, and The Maize Genome Sequencing Consortium3 1 Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY11724, USA 2 USDA-ARS NAA Plant, Soil & Nutrition Laboratory Research Unit, USA 3 Genome Sequencing Center, Washington University, St. Louis, MO 63108, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724; Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721; and Iowa State University, Ames, IA 50011 Maize Genome Gene Densities Summary Maize Accelerated Region Synteny Analysis From its domestication 8,000 years ago in Central America to its position today as the world’s leading harvested grain, Zea mays has played an important role in human civilization, providing food, animal feed, and biofuel. Maize also enjoys a long and distinguished history as a model organism owing to its rich diversity and tractable genetics. The complete sequence of the maize genome would therefore propel advances in basic research as well as agriculture and other industries. Sorghum The Maize Genome Sequencing Consortium was launched with a three-year grant from NSF to produce a complete sequence of the maize (B73) genome. At 2.5 Gb, the maize genome rivals mammalians in terms of size, and is six times larger than rice, owing to its high content of retrotransposable elements. To meet the challenge of producing an assembled sequence we took a BAC-by-BAC approach, selecting a minimal tiling path of clones from a 20X fingerprint map. Now in its third year, the project has produced complete sequences of 15,200 BAC clones comprising approximately 2 billion non-redundant bases, all available via GenBank. Annotation of this first draft, using both ab initio gene prediction and evidence-based approaches, gives preliminary estimates of gene numbers, many of which produce alternative transcripts. Comparison to rice, and a detailed analysis of a 22 Mb contig on chromosome 4, reveals that the maize genome has been largely shaped by its history of tetraploidization, subsequent rearrangement and duplicate gene loss. Gene annotations and comparative maps generated by this project are available at the Gramene Genome Browser (maizesequence.org). Mean gene densities across chromosomes Chromosome 1 2 3 4 5 6 7 8 9 10 Evidence-Genes 15.5(9.9) 16(12.3) 13.9(10.7) 13.7(11.8) 14.6(12.4) 15.8(11.4) 15.1(11.2) 16.2(11.8) 16.2(11.2) 15.8(11.6) *Densities calculated given 500Kb windows. *Standard deviations provided in parentheses. Count 58511 64716 254608 66525 • Genes were called on a freeze of the maize data containing 14,042 BACs. • WH: with homology (alignment to non-TE) • NH: no homology • TE: Alignment to TE’s in a curated DB • Prediction on masked sequence with Gramene Ensembl using same and cross species evidence • Evidence-based genes and transposons called on Maize BACs were projected to the maize FPC map to illustrate contiguous, chromosome-level gene frequency. • Virtual Core Bins were generated via IBM2 anchors to the FPC map. • The boxed area corresponds to the maize accelerated region. This region appears to be gene rich relative to the rest of the genome. 22 mb Sorghum • The maize accel region contains syntenic blocks to rice chr2 and sorghum chr4 • Maize: max gap between NETS 100,000 residues; min NET size 5000 residues. • Rice and sorghum: max NET gap 50,000 residues; min NET size 2000 residues. • Syntenic blocks are defined in two steps. First, NETS are grouped if the distance between them is smaller than twice the max gap parameter and there are no NETS breaking the synteny. Second, these groups are arranged into syntenic blocks up to 30 times the max gap parameter with two synteny breaking groups allowed. • The rice assembly is complements of TIGR (version 5), and early access to the sorghum assemblies complements of JGI. Maize Sorghum Maize Gene Level Synteny With Rice Survey of Retroelements Repeat Class† Nucleotides* Percent Class I retroelements 1,725,195,428 76.03 Ty1/copia-like elements 541,469,024 23.86 Ty3/gypsy-like elements 1,035,938,047 45.65 LINES 8,758,596 0.39 SINES 183,194 0.01 Other retroelements 138,846,567 6.12 Class II DNA transposons 38,170,464 1.68 hAT superfamily 2,987,960 0.13 CACTA superfamily 13,098,567 0.58 Mutator 4,220,844 0.19 Tourist-like MITEs 780,447 0.03 Other MITEs 3,120,948 0.14 Other DNA transposons 13,961,698 0.62 Simple repeats 625,223 0.03 High-copy-number genes 3,369,366 0.15 Other repeats 17,054,679 0.75 Total repeats 1,784,415,160 78.64 † Based on RepeatMasker using MIPS REcat library and classification database. *Based on 14,042 BAC clones having 2,269,061,581 bp of sequence • Class I retroelements are the most abundant in the genome. • Class II DNA transposons, simple repeats, and other repeats are far less abundant • 78% of the maize BACs sequence is repetitive. • The high gene density of the accelerated region (22.1 evidence-genes/500kb) is shown in further detail. • Clusters of Gramene Genes (GeneBuilder) and Fgenesh Models (ab initio) seem to mirror each other, a trend that is more apparent at 1 Mbase magnification. 40 grande 3.3% 35 30 1.0% 1.4% 3.0% prem1 4.3% cinful 4.5% milt • Retro elements comprise 76% of tekay the genome sequence. • DNA transposes, Sirs, and other xilon repeats comprise less than 3%. • ji and huck families together grande occupy 24% of the genome prem1 sequence. Rice chr 2 (29.0 – 35.8 Mb) Alignments to proteins in NR 68 genes • Inversion associated with a deletion resulting in an unmatched span of 68 genes in rice. • 2.7 -fold expansion in length of region in maize. • 476/961 (49.5%) maize genes are syntenic. • 380/1150 (33.0%) rice genes are syntenic. • Suggestive of extensive gene movement as well as loss in maize. Maize Accelerated Region Duplication Cereal sequence alignments 1800 1600 cinful zeon 5.0% 25 20 Maize 22 Mb region giepum 1.5% other 1.8% other 11.2% opie 15 8.6% zeon other gypsy huck Phred quality scores of BAC sequence giepum huck 11.9% ji 5 11.9% opie Mathematically defined repeats (20-mer freq’s in a 0.45X WGS maize library) ji 0 Ty3/gypsy Ty1/copia 1400 1200 1000 800 600 400 200 other copia 10 Blastz-NET Alignments Percent BAC Sequence (%) 45 milt tekay xilon Non-syntenic deletion MIPS repeats Retroelement Composition 50 Syntenic BACs at MaizeSequence.org Gene Predictions • FgenesH (ab initio) • Gramene Genes (evidence-based) • Ensembl tracks are configurable and data sets can be toggled on and off with user preference. 0 1 2 3 4 5 6 Rice Sorghum Inversion • Maize-sorghum and Maize-rice synteny illustrates two large scale inversions, one on Maize Chr4 (the accelerated region), and the other on Sorghum Chr4. 1 mb Rice Maize Inversion Transposons 67(21.2) 64.7(20.5) 65.3(21.1) 70.6(21.7) 64.5(21.2) 67.5(21.5) 65.1(22.2) 67.7(19.9) 60.6(21.8) 69.1(21.8) Survey of Gene Statistics Type WH NH TE Evidence Maize 7 8 9 10 Maize Chromosome • Rice Chr2 from positions 29MB to 36MB aligns to Maize Chromosomes 4 and 5 in equal measure indicating a duplication event. Alignments were made to maize BAC-contigs and mapped to Chromosomes 4 and 5 using the FPC map. • The majority of Chr4 hits were on FPC ctg182, corresponding to the accelerated region. The majority of NETS on Chr5 were on contigs 250, 251, 253, and 254 in agreement with marker based studies. PLoS Genet. 2007 Jul 20;3(7):e123 Rice

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ab initio - Ware Lab