Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
CCRC Cancer Conference November 8, 2015 www.bioinformatics.ca Module #: Title of Module 2 The ICGC Data Portal Part 1: Data submission, processing and release CCRC Workshop 2015 – Module 2 bioinformatics.ca ICGC Data Release Cycle Data files Data files Open Sign off Submission and Validation Portal Release Open Data Annotation & ETL Sign off Submission and Validation Release 1 Portal Release Data Annotation & ETL Release 2 Time CCRC Workshop 2015 – Module 2 bioinformatics.ca Data Type Submitted • To the Data Coordination Center (DCC) – – – – – – – – Simple somatic mutations and germline variants Copy number somatic mutations and germline variants Structural somatic mutations and germline variants DNA methylation Gene expression (RNA-Seq, microarrays) Protein expression miRNA Exon junctions • To the European Genome Archive (EGA) and CGHub – Raw sequencing data (FASTQ, BAM) CCRC Workshop 2015 – Module 2 bioinformatics.ca Data Validation at Submission CCRC Workshop 2015 – Module 2 bioinformatics.ca Data Annotations & ETL Pipeline • Annotations – Mutation frequencies – Mutation consequences • protein changes and their consequences for genes & transcripts (e.g. amino acid substitution, frameshift, nonsense-mediated decay etc) – Mutation functional impact • High impact mutation prediction by FatHMM – Gene Sets: Gene Ontology terms, Reactome Pathways, Cancer Gene Census • ETL data processing pipeline – Annotations and data are transformed and indexed using an ElasticSearch to support highly integrated search CCRC Workshop 2015 – Module 2 bioinformatics.ca THE ICGC Data Portal Part 2: Portal feature highlights CCRC Workshop 2015 – Module 2 bioinformatics.ca ICGC Data Portal Major functional sections CCRC Workshop 2015 – Module 2 https://dcc.icgc.org Quick keyword search bioinformatics.ca Top 20 mutated genes with high functional impact SSMs in selected cancer projects Facets Simple somatic mutation rate per donor across selected cancer projects CCRC Workshop 2015 – Module 2 bioinformatics.ca Project Entity Page ALSO • Most frequent mutations • Most affected donors • Publications CCRC Workshop 2015 – Module • Filter on high impact mutations 2 bioinformatics.ca Gene Entity Page Frequencies by cancer projects mutations Pfam domains for all transcripts CCRC Workshop 2015 – Module 2 bioinformatics.ca Reactome Pathway Entity Page CCRC Workshop 2015 – Module 2 bioinformatics.ca Mutation Entity Page Permanent ID across releases View the mutation in Genome Viewer Consequences for all transcripts CCRC Workshop 2015 – Module 2 bioinformatics.ca Genome Viewer CCRC Workshop 2015 – Module 2 bioinformatics.ca Current filters Donors, mutated genes and mutations found simultaneously Search data of interest by applying filters at Donor, Gene, and/or Mutation Save the current donors Export table Download data files for filtered donors only Search for donor files in external repositories (e.g. raw data) Facets: filter + count CCRC Workshop 2015 – Module 2 bioinformatics.ca Customized saved donor, gene and mutation sets Analyses: • Enrichment Analysis • Phenotype Comparison • Set Operation CCRC Workshop 2015 – Module 2 bioinformatics.ca File filters: Repository, Data Type, Experimental Strategy, File format, Access CCRC Workshop 2015 – Module 2 bioinformatics.ca Acknowledgment • Principal Investigator – Vincent Ferretti • Project Manager – Francois Gerthoffert • Lead bioinformatician – Junjun Zhang • Software Architect and Tech Lead • Business Analyst – Phuong-My Do • Software Developer – – – – Dusan Andric Terry Lin Michael Moncada Vitalii Slobodianyk – Bob Tiernay CCRC Workshop 2015 – Module 2 bioinformatics.ca The ICGC Data Portal Part 3: Live demo CCRC Workshop 2015 – Module 2 bioinformatics.ca