Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
NCRI Cancer Conference November 1, 2015 www.bioinformatics.ca Module #: Title of Module 2 The ICGC Data Portal Part 1: Data submission, processing and release NCRI Workshop 2015 bioinformatics.ca ICGC Data Release Cycle Data files Data files Open Sign off Submission and Validation Portal Release Open Data Annotation & ETL Sign off Submission and Validation Release 1 Portal Release Data Annotation & ETL Release 2 Time NCRI Workshop 2015 bioinformatics.ca Data Type Submitted • To the Data Coordination Center (DCC) – – – – – – – – Simple somatic and germline mutation Somatic copy number variation Somatic structural mutation Methylation Gene expression (RNAseq, Arrays) Protein expression miRNA Exon junctions • To the European Genome Archive (EGA) and cgHub – Sequencing raw data (Fastq, BAM) NCRI Workshop 2015 bioinformatics.ca Data Validation at Submission NCRI Workshop 2015 bioinformatics.ca Data Annotation & ETL Pipeline • Annotation – Mutation frequencies – Mutation gene consequences • Amino Acid changes and their consequences for all gene & transcripts (e.g. frameshift) – Mutation functional impact – Gene Ontology terms, Reactome pathways, Cancer Gene Census – Germline mutations masking • ETL pipeline – Annotated data indexed using an ElasticSearch cluster of 16 nodes NCRI Workshop 2015 bioinformatics.ca THE ICGC Data Portal Part 2: Portal features highlights NCRI Workshop 2015 bioinformatics.ca ICGC Data Portal NCRI Workshop 2015 bioinformatics.ca Top 20 mutated genes with high functional impact SSMs in selected cancer projects NCRI Workshop 2015 Simple somatic mutation rate per donor across selected cancer projects bioinformatics.ca Project Entity Page ALSO • Most frequent mutations • Most affected donors • Publications NCRI Workshop 2015 • Filter on high impact mutations bioinformatics.ca Gene Entity Page Frequencies by cancer projects Pfam domains for all transcripts NCRI Workshop 2015 bioinformatics.ca Reactome Pathway Entity Page NCRI Workshop 2015 bioinformatics.ca Mutation Entity Page Permanent ID across releases Consequences for all transcripts NCRI Workshop 2015 bioinformatics.ca Genome Viewer NCRI Workshop 2015 bioinformatics.ca Current filters Affected donors, mutated genes and mutations found simultaneously Search data of interest by applying filters at Donor, Gene, and/or Mutation Download data files for filtered donors only Export table Search for donor files in external repositories (e.g. raw data) NCRI Workshop 2015 bioinformatics.ca Customized saved donor, gene and mutation sets Analyses: • Enrichment Analysis • Phenotype Comparison • Set Operation NCRI Workshop 2015 bioinformatics.ca File filters: Repository, Data Type, Experimental Strategy, File format, Access NCRI Workshop 2015 bioinformatics.ca Acknowledgment • Principal Investigator – Vincent Ferretti • Project Manager – Francois Gerthoffert • Lead bioinformatician – Junjun Zhang • Software Architect and Tech Lead • Business Analyst – Phuong-My Do • Software Developer – – – – Dusan Andric Terry Lin Michael Moncada Vitalii Slobodianyk – Bob Tiernay NCRI Workshop 2015 bioinformatics.ca The ICGC Data Portal Part 3: Live demo NCRI Workshop 2015 bioinformatics.ca