Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Solanum lycopersicum Chromosome 4 Mapping and Finishing Update Wellcome Trust Medical Photographic Library SRC-UK and Wellcome Trust Sanger Institute SOL Korea – September 2007 Tomato Physical Map BACs are selected for sequencing on chromosome 4 using the physical map assembled in fpc. The map has been assembled using fingerprinted clones from 2 BAC libraries. Extending and gap filling clones are identified using end sequences. Clones are fingerprinted, entered in fpc and overlaps checked before being selected for sequencing. Tomato BAC libraries Library No. of clones Average Insert Genome equivalents Fingerprints LE_HBa 129,024 117 kb 15 X 88,000 (AGI) SL_MboI 52,992 135 kb 7X 43,000 (WTSI) SL_EcoI 72,264 95-100 kb 7X Map Coverage – Chromosome 4 Chromosome 4 is represented by 45 FPC contigs that cover approximately 22.2Mb, estimated from fingerprints (5 bands/kb). 40 clones have been selected to extend original contigs based on clone end sequence matches All contigs are anchored to the chromosome by SGN chromosome 4 markers FISH (H. de Jong, Wageningen) has confirmed the placement of some contigs on chromosome 4, but may refute placement of >= 7 contigs. Confirmation of chromosome 4 contigs is high priority. 142 markers are missing out of the 907 SGN chromosome 4 markers from current fpc build. Overgo probes are being used to screen the BAC libraries. They may identify ~47 additional clones The Syngenta marker data will also be used for identifying additional BACs. FISH Data Confirmation of chromosome location Verification of contig and marker placement Assessment of heterochromatin & euchromatin distribution This image demonstrates: FISH performed by S. B. Chang at Prof S. Stack’s Laboratory, University of Colorado, USA. – LE_HBa114C15 on short arm – LE_HBa308B7 on heterochromatin/centromere border – LE_HBa20F17 on long arm Chromosome 4 – Distribution of contigs Mapped Markers FISH confirmed ctg503 ctg5014 ctg5716 ctg5252 ctg15 ctg1189 ctg1406 ctg916 ctg5711 ctg1795 This shows that clones for sequencing have been selected from seed contigs along the length of the chromosome. Including those selected from putative heterochromatic regions to try to asses the boundary domains Distribution of Chromosome 4 Contigs Chr4 Mapped Markers FISH confirmed TG485 T0635 T0954 = Euchromatin = Heterochromatin Centromere T1322 CT_At5g T1068 TG287 P74 P41 TG163 37360 ctg503 ctg5014 ctg5252 ctg15 ctg1189 ctg1406 ctg5716 ctg916 ctg5711 ctg1795 Analysed BAC and Number of gene models bTH8H22 - 4 Genes bTH36C23 – 2 Genes bTH50I18 – 3 Genes bTH114C15 2 Genes bTH308B7 0 Genes bTH198L24 – 0 Genes bTH31H5 – 1 Gene bTH132O11 3 Genes bTH53M2 5 Genes bTH59M16 7 Genes This shows that clones for sequencing have been selected from seed contigs along the length of the chromosome. Ten contigs shown are from the current 45 fpc contigs on chr4 - including those selected from putative heterochromatic regions to try to assess the boundary domains. The number of gene models obtained from the gene prediction training set Sequence Plot of ctg916 euchromatin Sequence Plot of ctg5711 euchromatin Sequence Plot of ctg15 (heterochromatic euchromatic boundary region) Same plot as before with greyscale adjusted to view repeat features Sequence Plot of ctg5014 near centromere Same plot as before with greyscale adjusted to view repeat features TPF File Tile Path Format file – tab delimited flat file GAP ? CT990489 GAP CT990488 ? GAP ? CT990558 GAP CT990624 CT476825 CT573298 CT485992 type-3 ? LE_HBa-24G5 LE_HBa-20F17 type-3 ? LE_HBa-114C15 SL_MboI-143K21 type-3 ? LE_HBa-147F16 LE_HBa-308B7 type-3 ? LE_HBa-27G19 LE_HBa-198L24 LE_HBa-119A16 LE_HBa-31H5 ctg145 ctg145 ctg5716 ctg5716 ctg5014 ctg5014 ctg15 ctg15 ctg15 ctg15 AGP File Accesioned Golden Path – tab delimited flat file Order and alignment of Phase 3 finished accessions chr4 chr4 chr4 chr4 chr4 chr4 chr4 chr4 chr4 chr4 1 50001 100001 150001 200001 360433 370114 532278 582278 632278 50000 100000 150000 200000 360432 370113 532277 582277 632277 682277 1 2 3 4 5 6 7 8 9 10 N N N N F F F N N N 50000 clone 50000 clone 50000 contig 50000 clone CT476825.1 CT573298.1 CT485992.1 50000 contig 50000 clone 50000 contig Gaps and unfinished clones are entered as 50,000bp sections to more accurately represent the chromosome in each build no no no no 1 2001 2001 no no no 160432 11681 164164 + + + AGP View on SGN PseudoGoldenPath analysis for Contig Extension and Gap Closure A PGP viewer is being developed to visualise sequence alignments and contig positioning Contains finished and unfinished sequence Unfinished clones are represented as sequence contigs Unmasked BES aligned to PGP sequence using ssaha2 Parameters e.g. minimum percentage id = 95%, minimum of 60% of the end sequence found Map gaps are assigned an arbitrary 5kb size Clone candidates for contig extension checked with BLAST and fingerprinted Aim to incorporate other data such as markers Closing the Map using PGP Bridging clones identified from BES alignments to sequence Sequenced clones MAP GAP 53 clone extensions have been identified, including 5 merges with previously unplaced contigs. 2 merges of chromosome 4 contigs have also been made Extender from Fosmid Library Fosmid end sequences deposited by Cornell have been aligned to chromosome 4 sequence Potential Extender A copy of the fosmid library has been received at WTSI and ~ 50,000 clones will be end sequenced by December and the sequences deposited in the Ensembl / NCBI Trace repositories WTSI Tomato Clone Pipeline Pipeline Stage Number of BACs Subcloning 34 Shotgun 21 Assembly Start 7 Auto-prefinishing 3 Finishing 11 QC Checking 4 Phase 2 Finished 63 Phase 3 Total 143 HTGS: Phase 1 Chromosome 4 Sequence Generated Total Sequence Available 10,666,227 bp Total Unique Sequence 10,633,995 bp Total amount of Finished Sequence = 7,543,322 bp Summary of Progress on Chromosome 4 45 map contigs have been built on chromosome 4 Clone end sequence alignments visualised with the PGP viewer are being used to extend contigs and close gaps ~100,000 fosmid end sequences will be generated by end 2007 10.6Mb of sequence has been generated, of which 7.5Mb are finished All sequence assemblies >2kb are deposited in HTGS divisions of EMBL/GenBank/DDBJ Acknowledgements Wellcome Trust Sanger Institute: Jane Rogers Sean Humphray Clare Riddle and Mapping Core Group Karen McLaren and Finishing Team 46 Stuart McLaren and Pre-finishing Team 58 Christine Lloyd and QC Team 57 Karen Oliver Matt Jones Carol Scott Imperial College London: Gerard Bishop Daniel Buchan James Abbott Sarah Butcher University of Nottingham: Graham Seymour Scottish Crop Research Institute: Glenn Bryan FUNDING Cornell University: Lukas Mueller Jim Giovannoni MIPS/IBI Institute for Bioinformatics: Klaus Mayer Remy Bruggmann FISH Resources Stephen Stack Group (Colorado) Hans de Jong (Wageningen)