Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Detection of structural variants and copy number alterations in cancer: from computational strategies to the discovery of chromothripsis in neuroblastoma Valentina Boeva1,2,3, Bruno Zeitouni1,2,3, Tatiana Popova1,2,3, Kevin Bleakley1,2,3, Andrei Zinovyev1,2,3, Jean-Philippe Vert1,2,3, Isabelle JanoueixLerosey3,4, Olivier Delattre3,4 and Emmanuel Barillot1,2,3 E-mail: [email protected] 1 2 Inserm U900, 75248 Paris, France 3 Institut Curie, 26, rue d’Ulm, 75248 Paris, France Mines ParisTech, Fontainebleau, F-77300 France 4 Inserm U830, 75248 Paris, France Introduction In many studies that apply deep sequencing to cancer genomes, one has to calculate copy number profiles (CNPs) and predict regions of gain and loss. There exist two frequent obstacles in the analysis of cancer genomes: absence of an appropriate control sample for normal tissue and possible polyploidy. We therefore developed Control-FREEC1,2, able to automatically detect Copy Number Alterations (CNAs) with or without use of a control dataset and Loss of Heterozygosity (LOH) regions. For mate-paired/paired-ends mapping (PEM) data, one can complement the information about CNAs (i.e., output of Control-FREEC) with the predictions of Structural Variants (SVs) made by another tool that we developed, SVDetect3. Here we used a combination of Control-FREEC and SVDetect (http://bioinfo-out.curie.fr/projects/freec/sv.html) on neuroblastoma samples to (1) refine coordinates of CNAs using PEM data and (2) improve confidence in calling true positive rearrangements (particularly, in ambiguous satellite/repetitive regions). CNA & LOH detection (FREEC) • • • • • • • Detection of CNA regions Detection of LOH regions Possibility to work without control sample Possibility to set tumor ploidy Automatic window selection Use of mappability information Evaluation of and adjustment of contamination of tumor samples by normal cells • Possibility to work with exome data • Possibility to cross the output with the output of SVDetect Window size selection W = L/T/(CV)2, where L = genome length, T = total number of reads, CV = user-defined Coefficient of Variation. Calculation of BAF profiles Calculation of dependency function “RC vs GCcontent” or “RC sample vs RC control” F G To find a best fit by polynomial, shown in black (A-D), we first make an initialization of the polynomial's parameters (median value of RC for GC-content). Then, we optimize polynomial’s parameters by iteratively selecting data points related to P-copy regions and making a least-squares fit on them. Adjustment for a possible contamination by normal cells Results and graphical visualization Control-FREEC uses the following formula to evaluate the fraction of contaminating normal cells p, and then correct copy number profiles: NRCi ≈ Ei + (1 - Ei)p, where NRCi is the normalized read count in window i, Ei is the expected ratio in window i . B allele frequency Normalized Copy Number 1. List of gains and losses with assigned copy numbers 2. Visualization in R Annotation of B allele frequency profiles using Gaussian mixture model fit Segmentation 3. Creation of different file format outputs for graphical visualization: Circos, UCSC Genome Browser (BedGraph) Segmentation is done by a LASSO-based algorithm suggested by (Harchaoui and LévyLeduc, 2008). Detection of SVs (SVDetect) For mate-paired/paired-ends mapping (PEM) data, one can complement the information about copy number changes (i.e., output of FREEC) with the predictions of structural variants (SVs) made by SVDetect3. Automatic intersection of Control-FREEC and SVDetect outputs allows one to: • Refine coordinates of CNAs using PEMs • Filter out false predictions of SVDetect (often in ambiguous satellite/repetitive regions) SVDetect3 is a tool that allows the user to: • identify candidate SVs using the clustering of discordant PEMs, • predict the type of a SV using the PEM signature, • Filter out PEMs inconsistent with the main signature of the predicted SV, • Compare SVs predicted for different samples • Create different file format outputs for graphical visualization of predicted SVs Intra-chromosomal SVs Discovery of chromothripsis in neuroblastoma We investigated somatic rearrangements in two neuroblastoma cell lines and two primary tumors using paired-end sequencing of mate-pair libraries Neuroblastoma cell lines CLB-GA Inter-chromosomal SVs CLB-RE Primary neuroblastoma tumors with chromothripsis Illustrations of read signatures for SV type prediction (implemented in SVDetect3) 1 Control-free Circos representation of SVs predicted by SVDetect confirmed by the CNAs identified by Control-FREEC. (AC) NB1141, (D-E) NB1142. (A,D) whole genome view, (B, E) zoom on chromothripsis, (C, F) copy number profile for chr1 of NB1141 and chr6 of NB1142. calling of copy number alterations in deep-sequencing data using GC-content normalization. Boeva, V., et al. Bioinformatics, 2011; 27(2):268-9. http://bioinfo-out.curie.fr/projects/freec/ 2 Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. V. Boeva, et al. Bioinformatics, 2012, 28(3):423-5. 3 SVDetect - a bioinformatic tool to identify genomic structural variations from paired-end next-generation sequencing data. B. Zeitouni et al., Bioinformatics, 2010. 26: 1895-1896. http://svdetect.sourceforge.net