Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COSMIC: Annotating cancer genomes. What is COSMIC ? QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Published data Phenotype Sanger CGP data Genotype 4 classes of Mutation cDNA point mutations X X X Fusion genes X X - Whole genome annotations X X - CNV X X - cDNA point mutations Small intragenic mutations putatively affecting protein product of a single gene: - Nonsense, - Missense - Inframe Ins / Del - Frameshift - Complex replacement Gene-specific mutation spectrum COSMIC core: The Histogram Page (TP53) Point mutation histogram QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Complex mutations Insertions/Deletions Domain structures cDNA (CDS) scale Mutation counts/frequencies By tumour primary site Fused Genes TMPRSS2 / ERG fusions in 39% of Prostate tumours Copy Number Variation (CNV) Examining cancer aneuploidy using SNP microarrays Chr. 8 amplification of MYC oncogene in NCI-H2171 Lung tumour MYC 10n Diploid LOH Allele A Allele B Genome Position (Mb) Whole genome Solexa paired end sequencing Examining tumours for genomic rearrangements: - Fragment genomic DNA to ~500bp fragments - ligate adapters - surface bind - amplify - 35x scanned single-nucleotide sequencing reactions in pairs - align approx. 52 million sequences to reference genome - select pairs mapping <> 500bp apart - capillary sequence across selected regions to define exact breakpoint Sequenced 35 bps Chromosome 9 500 bps 35 bps Sequenced Chromosome 22 Tandem duplication – Chr 4 3 2 1 re b m u yn p o C 0 90 91 92 93 94 95 96 97 Genomic location (Mb) 2nd pair-end 1st pair-end 93850265 94571168 94571167 GRID2 1 2 Exons 3-10 Exons 3-10 11-16 Inverted duplication re b m u yn p o C 4 3 2 1 0 50 51 52 Paired read 1 53 54 55 Genomic location (Mb) Paired read 2 54692994 53155288 56 53161366 53127640 RAD51C 10 other genes duplicated 2:12 Fusion gene NCI-H2171: Chr 12 Chr 12 (- strand) Chr 2 (+ strand) 8 28984744 1775177 6 ....CAACAGT GAGTAT..... 4 2 CACNA2D4 Exon 36 CACNA2D4 re b m u yn p o C 0 1.50 1.75 2.00 2.25 WDR43 Intron 3 2.50 Genomic location (Mb) 34 35 36 4 5 CACNA2D4-WDR43 fusion gene Chr 2 Amplicon breakpoint detection GGH YTHDF3 CHD7 RLBP1L1 ASPH FAM77D TTPA 50 40 30 20 10 0 61.8Mb 64.5Mb 127.6Mb 129.1Mb 40 30 20 10 0 FAM84B MYC PVT1 lad de NC r I-H 21 71 NC I-B L2 17 1 10 0b p PVT1-CHD7 fusion gene PVT1 CHD7 Breast tumour summary. 8 Breast cancers now fully analysed - 888 Somatic rearrangements - 36 Fusion genes (18 IN FRAME) - 78 Internally rearranged genes (39 IN FRAME) - 17 Potential Promoter fusions Currently whole-genome-screening 94 tumours from: Lung, Skin, Kidney, Pancreas.... Summarising whole-genome mutation data Chromosome References ‘COSMIC ‘classic’ mutations CNV map Intrachromosomal rearrangements Interchromosomal rearrangements Further navigation: Selection genome positions or mutation types Navigation Rearrangement mutations / breakpoints Each rearrangement can have a number of breakpoints: Simple deletions may present only 1 breakpoint; Rearrangements involving sequence fragments or compound amplifications can present many: A t(12:8) translocation with 2 chromosome 12 “shards” at the interface. An amplification of a t(12:8) translocation; compound mitotic amplification events create multiple related breakpoints A known tumour-promoting mutation dataset COSMIC displays all mutations, not just those of known oncogenic potential. But, COSMIC’s cell-line resequencing project is examining 50 known cancer genes through 800 cell lines All of the mutations found are manually scrutinised to exclude potential passenger mutations or SNPs This makes it a very useful test dataset for mutation prediction software. Confirmed Oncogenic COSMIC future - Map mutations to Uniprot co-ords with Pfam, integrating into both websites (& distribute via DAS) - Finalise structural rearrangement ontology & nomenclature - Improve mining of rearrangement data; navigation by genomic positions & gene footprints - Import enormous rearrangement & non-coding mutation datasets Cosmic Page Impressions (PI) by Week 600000.00 400000.00 300000.00 200000.00 100000.00 2008 2/8/12 2/6/12 2/4/12 2/2/12 2/12/11 2/10/11 2007 2/8/11 2/6/11 2/4/11 2/2/11 2/12/10 2/10/10 2/8/10 2006 2/6/10 2/4/10 2/2/10 2/12/09 2/10/09 2/8/09 2005 2/6/09 2/4/09 2/2/09 2/12/08 2/10/08 2/8/08 2004 2/6/08 2/4/08 0.00 2/2/08 Page Impressions 500000.00 Summary COSMIC is about to incorporate whole-genome Solexa sequencing results - Integrate all oncogenic mutation types: Point mutations Fusion genes Copy Number Variants Genomic rearrangements - Annotation HGVS - style summarisation Ensembl annotated breakpoint detail Uniprot integration - Adding meaning to the dataset Increasingly important as the quantity & range of data soars Mutation consequence - does the variant promote cancer ? - Software (CanPredict, SIFT etc?) A known positive oncogenic mutation dataset for testing these