Download COSMIC: Annotating cancer genomes.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
COSMIC:
Annotating cancer genomes.
What is COSMIC ?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Published data
Phenotype
Sanger CGP data
Genotype
4 classes of Mutation
cDNA point mutations
X
X
X
Fusion genes
X
X
-
Whole genome annotations
X
X
-
CNV
X
X
-
cDNA point mutations
Small intragenic mutations
putatively affecting protein
product of a single gene:
- Nonsense,
- Missense
- Inframe Ins / Del
- Frameshift
- Complex replacement
Gene-specific mutation spectrum
COSMIC core:
The Histogram Page
(TP53)
Point mutation histogram
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Complex mutations
Insertions/Deletions
Domain structures
cDNA (CDS) scale
Mutation counts/frequencies
By tumour primary site
Fused Genes
TMPRSS2 / ERG fusions in
39% of Prostate tumours
Copy Number Variation (CNV)
Examining cancer aneuploidy using SNP microarrays
Chr. 8 amplification of MYC oncogene in NCI-H2171 Lung tumour
MYC
10n
Diploid
LOH
Allele A
Allele B
Genome Position (Mb)
Whole genome Solexa paired end sequencing
Examining tumours for genomic rearrangements:
- Fragment genomic DNA to ~500bp fragments
- ligate adapters
- surface bind
- amplify
- 35x scanned single-nucleotide sequencing reactions in pairs
- align approx. 52 million sequences to reference genome
- select pairs mapping <> 500bp apart
- capillary sequence across selected regions to define exact breakpoint
Sequenced 35 bps
Chromosome 9
500 bps
35 bps Sequenced
Chromosome 22
Tandem duplication – Chr 4
3
2
1
re
b
m
u
yn
p
o
C
0
90
91
92
93
94
95
96
97
Genomic location (Mb)
2nd pair-end
1st pair-end
93850265
94571168
94571167
GRID2
1
2
Exons 3-10
Exons 3-10
11-16
Inverted duplication
re
b
m
u
yn
p
o
C
4
3
2
1
0
50
51
52
Paired read 1
53
54
55
Genomic location (Mb)
Paired read 2
54692994
53155288
56
53161366
53127640
RAD51C
10 other genes duplicated
2:12 Fusion gene
NCI-H2171: Chr 12
Chr 12 (- strand)
Chr 2 (+ strand)
8
28984744
1775177
6
....CAACAGT GAGTAT.....
4
2
CACNA2D4
Exon 36
CACNA2D4
re
b
m
u
yn
p
o
C
0
1.50
1.75
2.00
2.25
WDR43
Intron 3
2.50
Genomic location (Mb)
34
35
36
4
5
CACNA2D4-WDR43 fusion gene
Chr 2
Amplicon breakpoint detection
GGH YTHDF3
CHD7
RLBP1L1
ASPH
FAM77D
TTPA
50
40
30
20
10
0
61.8Mb
64.5Mb
127.6Mb
129.1Mb
40
30
20
10
0
FAM84B
MYC
PVT1
lad
de
NC
r
I-H
21
71
NC
I-B
L2
17
1
10
0b
p
PVT1-CHD7 fusion gene
PVT1
CHD7
Breast tumour summary.
8 Breast cancers now fully analysed
-
888 Somatic rearrangements
-
36 Fusion genes (18 IN FRAME)
-
78 Internally rearranged genes (39 IN FRAME)
-
17 Potential Promoter fusions
Currently whole-genome-screening 94 tumours from:
Lung, Skin, Kidney, Pancreas....
Summarising whole-genome mutation data
Chromosome References
‘COSMIC ‘classic’ mutations
CNV map
Intrachromosomal rearrangements
Interchromosomal rearrangements
Further navigation:
Selection genome positions
or mutation types
Navigation
Rearrangement mutations / breakpoints
Each rearrangement can have a number of breakpoints:
Simple deletions may present only 1 breakpoint;
Rearrangements involving sequence fragments or compound amplifications can present many:
A t(12:8) translocation with 2 chromosome 12 “shards” at the interface.
An amplification of a t(12:8) translocation;
compound mitotic amplification events create multiple related breakpoints
A known tumour-promoting mutation dataset
COSMIC displays all mutations, not just those of known oncogenic potential.
But, COSMIC’s cell-line resequencing project is examining
50 known cancer genes through 800 cell lines
All of the mutations found are manually scrutinised to exclude potential passenger mutations or SNPs
This makes it a very useful test dataset for mutation prediction software.
Confirmed
Oncogenic
COSMIC future
- Map mutations to Uniprot co-ords with Pfam, integrating into both websites (& distribute via DAS)
- Finalise structural rearrangement ontology & nomenclature
- Improve mining of rearrangement data; navigation by genomic positions & gene footprints
- Import enormous rearrangement & non-coding mutation datasets
Cosmic Page Impressions (PI) by Week
600000.00
400000.00
300000.00
200000.00
100000.00
2008
2/8/12
2/6/12
2/4/12
2/2/12
2/12/11
2/10/11
2007
2/8/11
2/6/11
2/4/11
2/2/11
2/12/10
2/10/10
2/8/10
2006
2/6/10
2/4/10
2/2/10
2/12/09
2/10/09
2/8/09
2005
2/6/09
2/4/09
2/2/09
2/12/08
2/10/08
2/8/08
2004
2/6/08
2/4/08
0.00
2/2/08
Page Impressions
500000.00
Summary
COSMIC is about to incorporate whole-genome Solexa sequencing results
-
Integrate all oncogenic mutation types:
Point mutations
Fusion genes
Copy Number Variants
Genomic rearrangements
-
Annotation
HGVS - style summarisation
Ensembl annotated breakpoint detail
Uniprot integration
-
Adding meaning to the dataset
Increasingly important as the quantity & range of data soars
Mutation consequence - does the variant promote cancer ?
- Software (CanPredict, SIFT etc?)
A known positive oncogenic mutation dataset for testing these