Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
WG5: e-Technologies Martin Dugas, Jaakko Hollmen EuGESMA European Genomics and Epigenomics Study on MDS and AML Goals of WG5 • Central support for data analysis, management and interpretation • Research into novel methods for integration of clinical and molecular data, initially with respect to the analysis of microarray data • Integrative data analysis will be performed in collaboration with expert biostatisticians in the field • Development of data management and analysis systems for various chip platforms, such as gene expression profiling (Affymetrix), SNP arrays, array CGH, ChIP-on-chip, microRNA data, epigenetic profiling, proteomic data, high-throughput sequencing • Application of standard biometric procedures (e.g. survival analysis) to data from AML and MDS trials Expected outcomes of WG5 • Harmonization of data from multiple centres who may be using different chip array types and platforms • Integration of molecular data from mRNA, miRNA, epigenetic, SNP and CGH studies via an interactive and dynamic interface driven through mutation, cytogenetic and outcome parameters • The identification of target genes and pathways for development of, and testing of, novel therapeutic drugs, molecules and agents • The generation and frequently updating the Action specific website (via the web-site coordinator) Agenda • Javier de las Rivas Tools to integrative analyses of Affymetrix microarray data (expression and copy number) and method to build a leukemia multiclass predictor based on transcriptomic profiling • Jaakko Hollmen Modeling DNA copy number amplification pattern in human cancers • Cesare Furlanello Recent material on biomarker stability from predictive classifiers • Silvio Bicciato Genomic data integration with specific application to myelopoiesis • Lara Nonell Microarray data analysis and integration approach with an overview of ongoing and incoming leukemia/MDS projects • Andrea Zangrando MLL rearrangements in pediatric acute lymphoblastic and myeloblastic leukemias: MLL specific and lineage specific signatures • Lucjan Wyrwicz Functional annotation of gene lists in microarray studies Relevance of e-Technologies: The data explosion continues • Affymetrix GeneChip 2.0plus: ~40.000 probesets • Affymetrix SNP-Chip 6.0: ~1 Mio. SNPs • ChIP-Seq: ~5 Mio. sequence reads • Whole-Genome-Sequencing: 3 Billion base pairs [Ley TJ, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 2008; 456(7218):66-72] What is the challenge? Tools to integrate analyses of microarray data: integrative genomics Huge amounts of data per array + hundreds of samples [Javier de las Rivas] We address the challenge Tools to integrate analyses of microarray data: integrative genomics Open methods to integrated / combined data mining and data analyses 1. Expression arrays: Affymetrix Human_Exon_1.0 measuring at once in “omic”-scale genes, exons, miRNAs & ncRNAs GATE = Genomic and Transcriptomic Explorer includes probes mapping to loci http://bioinfow.dep.usal.es/xgate/ Mapping de novo all oligo probes from Affymetrix expression microarrays 7 [Javier de las Rivas] A stable snapshot of highly unstable condition [Lucjan Wyrwicz] COST – WG5 Antwerp, March 2009 Approach for microarray data analyses Introduction Aims M&M Results Conclusions Acknowledgements University of Padova [Andrea Zangrando] COST – WG5 Antwerp, March 2009 SAM results Introduction Aims M&M Constant part Phenotype Phenotype Translocation Translocation Variable part Translocation Translocation Phenotype Phenotype L1 L3 L2 L4 AML/MLL(-) vs AML/MLL(+) ALL/MLL(-) vs AML/MLL(-) ALL/MLL(+) vs AML/MLL(+) Comparison ID Results SAM comparisons ALL/MLL(-) vs ALL/MLL(+) Conclusions UP / down ALL/MLL(-) 1013 / 740 Acknowledgements UP / down AML/MLL(-) 1378 / 754 155 / 555 UP / down ALL/MLL(+) 754 / 1378 740 / 1013 379 / 601 UP / down AML/MLL(+) 555 / 155 Total (Common) Signature 1753 710 601 / 379 2132 980 379 622 Translocation specific Phenotype specific SAM results for paired comparisons between considered subgroups. Translocationspecific signature was obtained by matching deregulated probe sets from L1 and L3 comparisons, phenotype-specific signature from L2 and L4 comparisons. University of Padova [Andrea Zangrando] Data analysis Genotyping • One sample • Population studies Typical Workflow Quality analysis Normalization CNP Copy Number/LOH SNP Genotyping Interpretation [Lara Nonell] [Jaakko Hollmen] Profiles of DNA copy number amplification [Jaakko Hollmen] SODEGIR: single sample analysis defines regions with concomitant alterations of gene CN and GE in single samples (SODEGIR) Status q-value Score SODEGIR deleted CN loss =0 ≤quantile(d_CNgj,0.1) GE down ≤0.05 ≤quantile(d_GEgj,0.1) SODEGIR amplified CN gain =0 ≥ quantile(d_CNgj,0.9) GE up ≤0.05 ≥ quantile(d_GEgj,0.9) [Silvio Bicciato] GeneAnnot custom-CDFs www- [Silvio Bicciato] Concern on reproducibility of scientific results and the need for replication Repeatability: NG study Reproducibility of scientific results and the need for replication: on a leading journal, a multi-institutional study , papers about gene expression profiling: Inability to reproduce the analysis > 50% Partial reproduction in 1/3 Perfect reproduction in 11% 1. Editorial: “four teams of analysts treated the findings of a number of microarray papers published in the journal in 2005–2006 as their gold standard and attempted to replicate a sample of the analyses conducted on each of them, with frankly dismal results.”, Nature Genetics, Feb 2009 [Cesare Furlanello] List Stability Indicator The List Stability Indicator [Jurman, 2008] Based on the algebraic theory of metrics on symmetric groups A list can be represented as an element of the permutation group Sp key concept: Canberra distance between two ranked lists (of equal length) Given a set of ranked gene lists (ranking given by the classifier), the indicator is defined as the mean of all the pairwise distances • Hoeffding thm: the distances are (asymptotically) normally distributed FOR THIS STUDY: study variability of endpoints, effect of swapping theory extended to manage gene lists of different length • • • • Canberra Distance: two views Complete Canberra distance is measured over the set of features given for the endpoint (all probes in the platform) Core Consider only distances between features in candidate gene lists [Cesare Furlanello] 20 The FDA MAQC-II project Reaching consensus on the “best practices” (Data Analysis Protocol, DAP) in developing and validating microarray-based predictive models (classifiers) for clinical and preclinical applications. Reliable and robust predictive models are essential to realize the promises of personalized medicine. Recommendations on the development and validation of classifiers are put forward through the MAQC-II. Synergy with the FDA Voluntary eXploratory Data Submission (VXDS) program: Regulatory review of microarray pharmacogenomic data to develop for a biomarker qualification process (guidance for industry) • September 2008: 60 organizations, 36 data analysis teams, 18200 models generated on 13 endpoints (1) Understand the behavior of various prediction rules and gene selection methods that may be applied to microarray data sets to generate predictors of clinical outcomes; (2) Identify and characterize sources of variability in multi-gene prediction Participant organizations: government agencies, manufacturers of microarray platforms, microarray service providers, academic laboratories [Cesare Furlanello] WG5 - topics • Quality – experimental design – data (batch effects!) – analysis, prediction • Data analysis plans => reproducibility of data analysis => Best practice in genomic and epigenomic data analysis • Stability of gene signatures WG5 - topics • Data integration in translational research: => need for data analysis platforms, especially CN + GEP • (Semi-)Automated gene signature analysis • Multiclass predictive models • Mapping of microarray data • Informatics for next generation sequencing • Training in bioinformatics WG5: e-Technologies Martin Dugas, Jaakko Hollmen EuGESMA European Genomics and Epigenomics Study on MDS and AML