* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Combining Whole-exome and RNA-Seq Data Improves the Quality
Cell-free fetal DNA wikipedia , lookup
Genomic library wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Minimal genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression programming wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genomic imprinting wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Koinophilia wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Pathogenomics wikipedia , lookup
BRCA mutation wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Metagenomics wikipedia , lookup
Population genetics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Oncogenomics wikipedia , lookup
Frameshift mutation wikipedia , lookup
Combining whole-exome and RNA-Seq data improves the quality of PDX mutation profiles 1 1 1 Manuel Landesfeind , Bruno Zeitouni , Anne-Lise Peille , Vincent Vuaroqueaux 1 Oncotest - Charles River, Am Flughafen 12-14, 79108 Freiburg, Germany 1 2 Introduction Sequenom / Sanger Here we investigated whether our WES analysis correctly detected mutations previously identified by Sanger sequencing. Afterwards, we compared mutation analyses carried out at the genomic and transcriptomic levels and investigated the observed similarities and discrepancies. Figure 1: Distribution of cancer types in the available PDX collection by lineage (n=339). Data from Oncotest Compendium v1.6. Kidney (25) Stomach (33) Pancreas (45) HUMAN-SPECIFIC MAPPING DATA (BAM) RNA-Seq QC Table 1: Filtering thresholds for NGS mutation analysis. Variant calling Filtering parameter Variant annotation 50 Variant filtering 25 Lung Breast Pancreas Melanoma Stomach Alternative allele coverage min. 3 reads Alternative allele frequency min. 5% Functional impact Kidney Others MUTATIONS: SNPs and INDELs (VCF) Total “moderate” or “high” Number of tools reporting variant (WES only) Figure 3: PDX-specific mutation analysis pipeline for WES and RNASeq data. • On average, less than 14% of all mutations are detected by both WES and RNA-Seq analyses (Fig. 8). • Main reasons for not retrieving a mutation are : • Non-expression of gene in RNA-Seq • Mono-allelic expression of wild-type in RNA-Seq • Low coverage in WES • Proportion of genes with mutation detected at both levels reached 54.6% for cancer-related genes (Fig. 9). Discrepancies between WES and RNA-Seq for these cancer-related genes are due to similar causes as for the overall observation (Fig. 10). Figure 7: Boxplot for proportions of mutations types reported by WES and RNA-Seq analyses (n=92 per mutation type and data type). Y-axis limits differ between plots due to the high proportion of missense mutations. Mutation rate WES 16 12 40 8 4 Filtered in WES (2.8% / 84.6%) 30 Wild-type allele only (0.5% / 15.4%) Covered in WES (3.3% / 12.2%) Low coverage in WES (23.9% / 87.7%) 20 RNA-Seq specific (27.2%) 10 Breast (10) Colon (47) SCLC (5) WES 100 RNA-Seq 25 20 75 15 50 10 25 5 0 0 Missense Codon Codon deletion insertion Frame shift Splice site Common (13.8%) NSCLC (40) Figure 6: Mutation rate detected in WES and RNA-Seq data across analyzed cancer types. The mutation rate is higher in WES than in RNA-Seq for 71 models. The mutation rate is calculated from mutations found in target regions of WES and in expressed transcripts of RNA-Seq respectively. Top-right figure displays linear regression analysis. Proportion of mutations (%) • The proportion of mutation types is not different between WES and RNA-Seq (Fig. 7). 30 50 Mutation rate (per Mb) • Mutation rate detected in WES is usually higher than detected by RNA-Seq (Fig. 6). 20 Mutation rate RNA-Seq • Good correlation between mutation rates (number of mutations per Mb) detected by WES and RNA-Seq analysis (Spearman correlation coefficient = 0.79). 10 Start lost Nonsense Stop lost Figure 8: Comparison of overlap of mutations detected in WES and RNA-Seq analyses. Percentages are averaged for 92 single model analyses. For each category, the first percentage (bold font) specifies the fraction of all exonic mutations while the second percentage specifies the proportion with respect to the category above. WES specific (58.9%) • Our WES analysis retrieved > 95% of these mutations (Fig. 4) • The 4.4% mutations missed by WES are mainly due to pipeline filtering or not being targeted by the enrichment kit (Fig. 5) Filtered in RNA-Seq (1.3% / 7.4%) Covered in RNA-Seq (17.6% / 29.9%) • 509 additional mutations were reported by WES at positions not analyzed by Sanger sequencing • Regarding the genomic regions analyzed by WES and Sanger, a high overall detection quality is achieved by our pipeline (Tab. 2) Table 2: Quality metrics for WES mutation detection. min. 10 bases Sensitivity Precision 95.6% 99.8% Common RNA-Seq specific WES specific APC TP53 KRAS PIK3CA NOTCH1 TGFBR2 ALK SMAD4 PTEN NOTCH2 ERBB4 BRAF ERBB3 EGFR ATM MSH6 MLH1 Gene expressed (7.6% / 18.5%) Homozygous in WES (0.5% / 2.8%) Sanger 510 Colon 478 22 No coverage at exon (3) Position of mutation in STK11 not targeted by Agilent Kit v1 38mb (6) Conclusions Analysis of cancer-related genes using our WES pipeline accurately identified mutations that were previously detected by Sanger sequencing. Breast WES analysis investigated larger genomic ranges and thereby generated more comprehensive mutation profiles. Only a small proportion of genomic mutations are retrieved on the transcriptomic level mainly due to nonexpression of gene or expression of wild-type allele. RNA-Seq analysis detected mutations in genomic regions that are poorly enriched in WES. TP53 KEAP1 SMARCA4 U2AF1 KRAS RB1 CDKN2A NF1 ALK STK11 PTEN ERBB2 BRAF ARID1A KDR NFE2L2 MET MAG KIF5B ERBB4 SLC34A2 SETD2 ERBB3 Figure 10: Pie chart showing distribution of reasons for not retrieving mutations in cancerrelated genes using RNA-Seq data. Data from WES specific mutations exposed in Fig. 9. Figure 5: Pie chart Filtered (8) showing distribution of reasons for not retrieving mutations in WES. 5 KMT2C TP53 FLG BRCA1 LRP2 ATM RYR3 RYR2 MAP3K1 LRP1 GATA3 ETV6 ERBB2 Low coverage in RNA-Seq (41.3% / 70.1%) Gene not expressed (33.7% / 81.5%) WES KRAS mutations not found in pancreatic models with high mouse stroma content (5) Our results demonstrate the importance of transcript mutation analysis which is closer to the protein level and enhances understanding of cancer biology. NSCLC Mono-allelic expression of wild-type (15.8% / 89.8%) Figure 4: Venndiagram of 500 mutations from Sanger and 988 mutations from WES. • WES analysis reported one additional mutation that was not confirmed by Sanger sequencing max. 5 nucleotides Figure 9: Mutation landscape of lineage-specific cancerrelated genes for all analyzed models. Color indicates the analysis method by which the mutated gene was reported. For 201 genes, mutations are found on both transcriptome and genome. 141 genes are mutated on genome level only and 27 genes on transcriptome level only. Non-expressed genes are marked by “x”. Evaluation of WES mutation detection pipeline • We investigated as reference data, 25 genes in 284 models containing 500 mutations assessed by Sanger sequencing all 3 Median length of mapped segments showing alternative (RNA-Seq only) Comparison of mutation profiles betw een WES and RNA -Seq data RNA-Seq max. 2 times Homopolymer run length (RNA-Seq only) Figure 2: Proportion of PDX models characterized by different methods. Data from Oncotest Compendium v1.6. WES Threshold value Found in resequencing project 0 Colon Breast (17) Read Mapping on Mouse mm10 genome Removal of mouse reads 75 Lung (74) Melanoma (22) 4 Profiled models (%) Others (64) WES Read Mapping on Human hg19 genome Methods: PDX-specific bioinformatics pipelines (Fig. 3) comprises mapping of paired-end reads to human and mouse reference genomes using BWA and TopHat for WES and RNA-Seq data respectively. Reads that map better to mouse are removed. Human-specific duplicate reads are discarded. After base quality score recalibration, WES Variant detection is performed utilizing GATKLite, Samtools, Freebayes. Samtools only for RNA-Seq data. All variants are annotated by SnpEff to predict their impact at the protein level. The filtering of variants is based on the variant coverage statistics, the predicted effect in the protein and by discarding known polymorphisms from population centric annotation databases such as dbSNP, Hapmap, or 1000 genomes (Table 1). QC SEQUENCING DATA (FASTQ) 100 Colon (59) 3 Mutation profiling technologies and data analysis Materials: • Mutation screening using Sequenom Oncocarta panels 1-3 targeting mutations in 39 genes, validation of mutations by Sanger sequencing in 25 genes in 284 models • WES data for 300 models, DNA extraction using Agilent SureSelect All Human Exon kits (v1, v4, or v5) and sequencing with expected average-of-coverage of 100X • 92 models analyzed with RNA-Seq at minimum of 50M paired-end reads Patient-derived xenograft tumor models (PDX) are an important tool for anticancer agent testing. Oncotest has developed a unique collection of models covering all major lineages (Fig. 1). In the context of targeted therapy development, an accurate and comprehensive molecular characterization is crucial for model selection and for biomarker identification. We recently analyzed our PDX collection by whole-exome sequencing (WES) and RNASeq for the identification of mutations at both the genomic and transcriptomic levels. 1 SCLC Filtered (9) Mono-allelic expression of wild-type (58) Gene not expressed (41) Low coverage but gene expressed (33) We believe that the combined molecular characterization profiles from WES and RNA-Seq data increase the quality of our PDX collection and aid the selection of models for cancer therapy studies as well as for biomarker investigations.