Download SVDetect - Computational Systems Biology of Cancer team at Institut

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Cancer Genome Atlas wikipedia , lookup

Transcript
Detection of structural variants and copy number alterations in cancer:
from computational strategies to the discovery of chromothripsis in
neuroblastoma
Valentina Boeva1,2,3, Bruno Zeitouni1,2,3, Tatiana Popova1,2,3, Kevin Bleakley1,2,3, Andrei Zinovyev1,2,3, Jean-Philippe Vert1,2,3, Isabelle JanoueixLerosey3,4, Olivier Delattre3,4 and Emmanuel Barillot1,2,3
E-mail: [email protected]
1
2
Inserm U900, 75248 Paris, France
3 Institut Curie, 26, rue d’Ulm, 75248 Paris, France
Mines ParisTech, Fontainebleau, F-77300 France
4 Inserm U830, 75248 Paris, France
Introduction
In many studies that apply deep sequencing to cancer genomes, one has to calculate copy number profiles (CNPs) and predict regions of gain and loss. There exist two frequent obstacles in the
analysis of cancer genomes: absence of an appropriate control sample for normal tissue and possible polyploidy. We therefore developed Control-FREEC1,2, able to automatically detect Copy
Number Alterations (CNAs) with or without use of a control dataset and Loss of Heterozygosity (LOH) regions.
For mate-paired/paired-ends mapping (PEM) data, one can complement the information about CNAs (i.e., output of Control-FREEC) with the predictions of Structural Variants (SVs) made by
another tool that we developed, SVDetect3.
Here we used a combination of Control-FREEC and SVDetect (http://bioinfo-out.curie.fr/projects/freec/sv.html) on neuroblastoma samples to (1) refine coordinates of CNAs using PEM data and (2)
improve confidence in calling true positive rearrangements (particularly, in ambiguous satellite/repetitive regions).
CNA & LOH detection (FREEC)
•
•
•
•
•
•
•
Detection of CNA regions
Detection of LOH regions
Possibility to work without control sample
Possibility to set tumor ploidy
Automatic window selection
Use of mappability information
Evaluation of and adjustment of contamination of tumor
samples by normal cells
• Possibility to work with exome data
• Possibility to cross the output with the output of SVDetect
Window size selection
W = L/T/(CV)2,
where L = genome length, T = total number of
reads, CV = user-defined Coefficient of Variation.
Calculation of BAF
profiles
Calculation of dependency function “RC vs GCcontent” or “RC sample vs RC control”
F
G
To find a best fit by polynomial, shown in black (A-D), we first
make an initialization of the polynomial's parameters (median
value of RC for GC-content). Then, we optimize polynomial’s
parameters by iteratively selecting data points related to P-copy
regions and making a least-squares fit on them.
Adjustment for a possible contamination by normal
cells
Results and graphical visualization
Control-FREEC uses the following formula to evaluate the
fraction of contaminating normal cells p, and then correct copy
number profiles:
NRCi ≈ Ei + (1 - Ei)p,
where NRCi is the normalized read count in window i, Ei is the
expected ratio in window i .
B allele
frequency
Normalized
Copy Number
1. List of gains and losses with assigned copy numbers
2. Visualization in R
Annotation of B allele frequency profiles using
Gaussian mixture model fit
Segmentation
3. Creation of different file format outputs for graphical
visualization: Circos, UCSC Genome Browser (BedGraph)
Segmentation is done by a LASSO-based
algorithm suggested by (Harchaoui and LévyLeduc, 2008).
Detection of SVs (SVDetect)
For mate-paired/paired-ends mapping (PEM) data, one can complement the information
about copy number changes (i.e., output of FREEC) with the predictions of structural variants
(SVs) made by SVDetect3. Automatic intersection of Control-FREEC and SVDetect outputs
allows one to:
• Refine coordinates of CNAs using PEMs
• Filter out false predictions of SVDetect (often in ambiguous satellite/repetitive regions)
SVDetect3 is a tool that allows the user to:
• identify candidate SVs using the clustering of discordant PEMs,
• predict the type of a SV using the PEM signature,
• Filter out PEMs inconsistent with the main signature of the predicted SV,
• Compare SVs predicted for different samples
• Create different file format outputs for graphical visualization of predicted SVs
Intra-chromosomal SVs
Discovery of chromothripsis in
neuroblastoma
We investigated somatic rearrangements in two neuroblastoma cell lines and two primary
tumors using paired-end sequencing of mate-pair libraries
Neuroblastoma cell lines
CLB-GA
Inter-chromosomal SVs
CLB-RE
Primary neuroblastoma tumors with chromothripsis
Illustrations of read signatures for SV type prediction (implemented in SVDetect3)
1 Control-free
Circos representation of SVs predicted by SVDetect confirmed by the CNAs identified by Control-FREEC. (AC) NB1141, (D-E) NB1142. (A,D) whole genome view, (B, E) zoom on chromothripsis, (C, F) copy number
profile for chr1 of NB1141 and chr6 of NB1142.
calling of copy number alterations in deep-sequencing data using GC-content normalization. Boeva, V., et al. Bioinformatics, 2011; 27(2):268-9. http://bioinfo-out.curie.fr/projects/freec/
2 Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. V. Boeva, et al. Bioinformatics, 2012, 28(3):423-5.
3 SVDetect - a bioinformatic tool to identify genomic structural variations from paired-end next-generation sequencing data. B. Zeitouni et al., Bioinformatics, 2010. 26: 1895-1896. http://svdetect.sourceforge.net