Download Discovering conserved DNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Tumor Genome Sequencing
Xiaole Shirley Liu
STAT115, STAT215, BIO298, BIST520
Cancer
• Cancer will affect 1 in 2 men and 1 in 3 women in
the United States, and the number of new cases of
cancer is set to nearly double by the year 2050.
• Cancer is a genetic disease caused by mutations
in the DNA
• Clinically tumors can look the same but most
differ genetically.
Mutations in the Tumor Genome
• Help us identify important genes for
tumorigenesis and cancer progression
• Drivers – a.k.a gatekeepers, mutations that cause
and accelerate cancers
• Passengers – Accidental by-products and
thwarted DNA-repair mechanisms
• Recurrent mutations on genes or pathways are
likely drivers
High Throughput Driver Detection
• Differential gene expression
• Copy number aberration (CNA) or variation
(CNV) using CGH, tiling or SNP arrays
Comparative genomic hybridization (CGH)
GISTIC
• Gscore: frequency of occurrence and the amplitude of the
aberration
• Statistical significance evaluated by permutation
• FDR adjust for multiple hypothesis testing
Two Major Cancer Genome Projects
• TCGA: The Cancer Genome Atlas
–
–
–
–
US funded
~20 cancer types * a few hundred tumor samples each
Genome, transcriptome, DNA methylome, proteomics
Rigorous tumor sample QC, consistent profiling
platform
• ICGC: International Cancer Genome
Consortium
– 11 countries
– 20 cancer types * 500 tumor samples each
Different Sequencing Approaches
• Capture-seq ($400-600)
– Could focus well known mutations
• Exome-seq ($700-2K)
– All the exons in genes; promoters and LncRNA genes?
• RNA-seq ($500-2K)
– Expression and mutations together, miss anything?
• Whole genome sequencing ($3-4K)
– Majority of mutations non-coding, function unknown
– Better at detecting structural changes (translocations,
fusions)
– Cost-vs-benefit balance
MAF and VCF Formats
• VCF (GWAS format) and MAF (TCGA format)
• Both can annotate somatic mutations and germline
variants
• Tab delimited text file
• CHROM, POS, ID (SNP id, gene symbol, or ENTREZ
gene id), REF (reference seq), ALT (altered sequence),
QUAL (quality score), FILTER (PASS vs “q10;s50”
quality <=10, <=50% samples have data here), INFO
(allele counts, total counts, number of samples with data,
somatic or not, validated, etc)
GATK
• https://www.broadinstitute.org/gatk/guide/best-practices
FASTA-> BAM
BAM->VCF
Annotate
Example of a Cancer Genome
Mutations Profile
• Circos Plot: how messed up a cancer genome is
Total alterations affecting proteincoding genes in selected tumors
Vogelstein et al, Science 2013
Somatic Mutation Frequency
in 3K Tumor-Normal Pairs
• Typical tumors: median 45 mutations / tumor
• More mutations for tumors facing outside
Mutation Rate Heterogeneity
• Mutation rate correlated with replication timing,
gene expression, and gene length
• Tumor evolution and selection
TS vs Oncogenes, GoF vs LoF
• Tumor suppressors vs oncogenes
• Gain of Function (GoF) or Loss of Function
(LoF) mutations
– Phenotypes
• How to tell?
– From mutation patterns
– From expression patterns
– Functional studies
• Some genes can be both TS and oncogenes
Hallmarks of Cancer
Mutually Exclusivity and Co-occurrence
• Most cancers have >=2 sequential mutations
developed over many years.
• Mutations in different pathways can co-occur in
the same cancer, whereas those in the same
pathway are rarely mutated in the same sample.
How Much Should We Sequence?
• Need ~200 patients for 20% mutation rate, ~550
pts for 10%, ~1200 pts for 5% mutation rate.
• Most driver mutations have been found, pressing
need in basic cancer research to study their
function
• Biggest surprise: mutations on chromatin
regulators
–
–
–
–
> 50% new and strong cancer driver genes
Oncogenes: DNMT3A, IDH1
Tumor Suppressor: MLL, ATRX, ARID1A, SNF5
Both: EZH2
Resources
• MSKCC CBioPortal
– GUI interface for experimental biologists
• Broad FireHose
– API for accessing processed TCGA data
• UCSC CGHub
– API for accessing raw and processed cancer data
• Sanger COSMIC
– Catalog of Somatic Mutations in Cancer
• Many also provide software tools
Summary
•
•
•
•
•
•
Different sequencing approaches
Different mutation types and distributions
Gain or loss of function mutations
Tumor suppressor vs oncogenes
Cancer pathways or hallmarks
Mutation co-occurrence and mutual exclusivity
• How to study the functions of the mutations?
Acknolwedgement
•
•
•
•
Aleksandar Milosavljevic
John Pack
Cheng Li
Xujun Wang
Related documents