* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lect19_TumorSeq
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Metagenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Minimal genome wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Designer baby wikipedia , lookup
BRCA mutation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Genome editing wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Frameshift mutation wikipedia , lookup
Genome evolution wikipedia , lookup
Tumor Genome Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST512 Cancer • Cancer will affect 1 in 2 men and 1 in 3 women in the United States, and the number of new cases of cancer is set to nearly double by the year 2050. • Cancer is a genetic disease caused by mutations in the DNA • Clinically tumors can look the same but most differ genetically. 2 Different Sequencing Approaches • Capture-seq ($400-600) – Could focus well known mutations • Exome-seq ($700-2K) – All the exons in genes; promoters and LncRNA genes? • RNA-seq ($500-2K) – Expression and mutations together, miss anything? • Whole genome sequencing ($3-4K) – Majority of mutations non-coding, function unknown – Better at detecting structural changes (translocations, fusions) – Cost-vs-benefit balance 3 Two Major Cancer Genome Projects • TCGA: The Cancer Genome Atlas (US) – – – – > 30 cancer types and > 10K tumor samples Primary tumors, fewer death events Genome, transcriptome, DNA methylome, proteomics Rigorous tumor sample QC, consistent profiling platform • ICGC: International Cancer Genome Consortium (11 countries) – 20 cancer types * 500 tumor samples each 4 Tumor Gene Expression • Microarrays or RNA-seq • Data analysis? • Differential expression between cancer and normal • Cluster the tumor samples into sub-types – Consensus clustering: sampling genes or tumors, get robust clustering • Predict patient outcome (survival or recurrence) Break 5 Survival Analysis • Do patients receiving the treatment live longer? • Are smokers more likely to have cancer currence • Censored data: the value of a measurement or observation is only partially known – Some patients left the study – Study concluded 6 Survival Without Censoring 7 Survival With Censoring 8 Kaplan Meier Curve • More individuals in each group, better separation of the groups, better p-value 9 Log Rank Test 10 Log Rank Test 11 More Variables • 50-signature? • Logistic regression: – Estimate odds ratio: ratio of proportions – Linear combination of all the genes to separate outcome (0, 1). • Cox Regression – Estimate hazard ratio: ratio of incidence rates – Models the effect of covariates on the hazard rate but leaves the baseline hazard rate unspecified 12 Use Cox Regression to Separate Two Groups by Gene Signature 13 Caution About Gene Signature’s Predictive Power Break 14 Mutations in the Tumor Genome • Help us identify important genes for tumorigenesis and cancer progression • Drivers – a.k.a gatekeepers, mutations that cause and accelerate cancers • Passengers – Accidental by-products and thwarted DNA-repair mechanisms • Recurrent mutations on genes or pathways are likely drivers 15 High Throughput Driver Detection • Differential gene expression • Copy number aberration (CNA) or variation (CNV) using CGH, tiling or SNP arrays 16 Comparative genomic hybridization (CGH) 17 GISTIC • Gscore: frequency of occurrence and the amplitude of the aberration • Statistical significance evaluated by permutation • FDR adjust for multiple hypothesis testing 18 GATK • https://www.broadinstitute.org/gatk/guide/best-practices FASTQ-> BAM BAM->VCF Annotate 19 MAF and VCF Formats • VCF (GWAS format) and MAF (TCGA format) • Both can annotate somatic mutations and germline variants • Tab delimited text file • CHROM, POS, ID (SNP id, gene symbol, or ENTREZ gene id), REF (reference seq), ALT (altered sequence), QUAL (quality score), FILTER (PASS vs “q10;s50” quality <=10, <=50% samples have data here), INFO (allele counts, total counts, number of samples with data, somatic or not, validated, etc) 20 Example of a Cancer Genome Mutations Profile • Circos Plot: how messed up a cancer genome is 21 Total alterations affecting proteincoding genes in selected tumors 22 Vogelstein et al, Science 2013 Somatic Mutation Frequency in 3K Tumor-Normal Pairs • Typical tumors: median 45 mutations / tumor • More mutations for tumors facing outside Break 23 TS vs Oncogenes, GoF vs LoF • Tumor suppressors vs oncogenes • Gain of Function (GoF) or Loss of Function (LoF) mutations – Phenotypes • How to tell? – From mutation patterns – From expression patterns – Functional studies • Some genes can be both TS and oncogenes 24 Mutation Rate Heterogeneity • Mutation rate correlated with replication timing, gene expression, and gene length • Tumor evolution and selection 25 Lawrence et al, Nat 2013 Recurrent Mutations • Known • Novel clear cancer assoc • Novel 26 Lawrence et al, Nat 2014 How Much Should We Sequence? • Need ~200 patients for 20% mutation rate, ~550 pts for 10%, ~1200 pts for 5% mutation rate. • Most driver mutations have been found, pressing need in basic cancer research to study their function • Biggest surprise: mutations on chromatin regulators – – – – > 50% new and strong cancer driver genes Oncogenes: DNMT3A, IDH1 Tumor Suppressor: MLL, ATRX, ARID1A, SNF5 Both: EZH2 • Sequencing metastasized or drug resistant tumors might yield insights on tumor progression 27 Resources • MSKCC CBioPortal – GUI interface for experimental biologists • Broad FireHose – API for accessing processed TCGA data • UCSC CGHub – API for accessing raw and processed cancer data • Sanger COSMIC – Catalog of Somatic Mutations in Cancer • Many also provide software tools 28 Summary • • • • • • Different sequencing approaches Gene Expression, tumor sub-typing Survival analysis: KM vs Cox Regression Different mutation types and distributions Gain or loss of function mutations Tumor suppressor vs oncogenes 29 Acknolwedgement • • • • • • • • Aleksandar Milosavljevic Kristin Sainani Linda Staub & Alexandros Gekenidis Yin Bun Cheung, Paul Yip John Pack Cheng Li Xujun Wang Peng Jiang 30