Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module 3 Pathway and Network Analysis Lincoln Stein Classes of Gene Set Analysis DAVID GSEA Reactome FI network PARADIGM Khatri et al. PLOS Comp Bio. 8:1 2012 Module 3 bioinformatics.ca Limitations of Gene Set Enrichment Analysis • Many possible gene sets – diseases, molecular function, biological process, cellular compartment, pathways... • Gene sets are heavily overlapping; need to sort through lists of enriched gene sets! • “Bags of genes” obscure regulatory relationships among them. Module 3 bioinformatics.ca Pathway Databases • Advantages: – – – – Usually curated. Biochemical view of biological processes. Cause and effect captured. Human-interpretable visualizations. • Disadvantages: – Sparse coverage of genome. – Different databases disagree on boundaries of pathways. Module 3 bioinformatics.ca KEGG Module 3 bioinformatics.ca Reactome • Hand-curated pathways in human. • Rigorous curation standards – every reaction traceable to primary literature. • Automatically-projected pathways to non-human species. • 22 species; 1112 human pathways; 5078 proteins. • Features: – – – – Google-map style reaction diagrams with overlays; Find pathways containing your gene list; Calculate gene overrepresentation in pathways; Find corresponding pathways in other species. • Open access. Module 3 bioinformatics.ca Reactome Module 3 bioinformatics.ca Pathway Commons Module 3 bioinformatics.ca Pathway Colorization • Main feature offered by all pathway databases. • Upload a gene list • Database calculates an enrichment score on each pathway and displays ranked list. • Browse into pathways of interest; download colorized pictures. Module 3 bioinformatics.ca Example from Reactome Module 3 bioinformatics.ca Example from Reactome Module 3 bioinformatics.ca Module 3 bioinformatics.ca Networks • Pathways capture only the “well understood” portion of biology. • Networks cover less well understood relationships: – – – – – Module 3 Genetic interactions Physical interaction Coexpression GO term sharing Adjacency in pathways bioinformatics.ca Module 3 bioinformatics.ca Module 3 bioinformatics.ca Module 3 bioinformatics.ca Module 3 bioinformatics.ca Module 3 bioinformatics.ca Network Databases • Can be built automatically or via curation. • Popular sources of curated networks: – BioGRID – Curated interactions from literature; 529,000 genes, 167,000 interactions. – InTact – Curated interactions from literature; 60,000 genes, 203,000 interactions. – MINT – Curated interactions from literature; 31,000 genes, 83,000 interactions. Module 3 bioinformatics.ca Uncurated Interaction Sources • Text mining approaches – Computationally extract gene relationships from text, such as PubMed abstracts. – Much faster than hand curation. – Not perfect: • Problems recognizing gene names. Is hedgehog a gene or a species? • Natural language processing is difficult. – Popular resources: • iHOP • PubGene Module 3 bioinformatics.ca Uncurated Interaction Sources • Experimental techniques – Yeast 2 hybrid protein interactions. – Protein complex pulldowns/mass spec. – Genetic screens, such as synthetic lethals, enhancer/suppressor screens. – NOT perfect • Y2H interactions have taken proteins out of natural context; physical interaction != biological interaction. • Protein complex pulldowns plagued by “sticky” proteins such as actin. • Genetic screens highly sensitive to genetic background (“network effects”). Module 3 bioinformatics.ca Integrative Approaches • Combine multiple sources of evidence to increase accuracy. • Simple example: – “Party hubs” are Y2H interactions that have been filtered for those partners that share the same temporal-spatial location. • Complex example: – Combine multiple sources of curated and uncurated evidence. Module 3 bioinformatics.ca Example: Reactome FI Network Curated Human Data – Version 35. 5078 proteins 4166 reactions 3870 complexes 1112 pathways Only ~25% of genome! Goal: add a “corona” of uncurated interaction data around scaffold of curated pathway data. Expanding Reactome’s Coverage Curated Pathways Uncurated Information human PPI PPI inferred from fly, worm & yeast PPI from text mining GeneWays Gene co-expression CellMap TRED GO annotation on biological processes Protein domaindomain interactions Naïve Bayes Classifier Annotated Functional Interactions Predicted Functional Interactions Wu et al. (2010) Genome Biology Integrated Functional Interaction (FI) Network •10,956 proteins (9,542 genes). •209,988 FIs. •~50% coverage of genome. •False (+) rate < 1% •False (-) rate ~80% 5% of network shown here Active Network Extraction & Analysis Reactome Functional Interaction network Add Linker genes Extract mutated, overexpressed, undexpressed, expanded/deleted genes Disease subnetwork Apply community clustering algorithms Disease “modules” Hypothesis generation Sample classification Disease gene prediction Module 3 bioinformatics.ca Pancreatic Cancer Module Map (43 Cases) Transcription & translation p53, SMAD, TGFβ, TNF signaling Non-silent mutations • blue – in primary tumour only • green – in xenograft only • red – in primary & xenograft Cell cycle Zinc fingers Transcription Wnt & Cadherin signaling Heterotrimeric G-protein signaling Ca2+ Signaling Hedgehog signaling Rho GTPase signaling KRAS, MAPK signaling Christina Module 3 Yung Integrin signaling bioinformatics.ca Glioblastoma stem cells (GSC) Irina Kalatskaya in collaboration with Peter Dirks lab (SickKids) Glioblastoma Stem Cell Network complement HOX TP53/RB1/JUN/SP1 GLI2 collagen Beta-catenin IL-1 FGF Ribosomal proteins BMP GPCR Small Rho proteins CREB1 Module 3 bioinformatics.ca Network Classification of Disease • Traditional: Associate active genes with clinical behavior to create gene-based prognostic signatures. • Limitations: Too many genes reduces statistical power • New idea: Look for associations between active modules and clinical behavior. Module 3 bioinformatics.ca Using the Reactome FI Network to Find a Breast Cancer Survival Signature Principal Expression Disease Module Map component Correlate Analysis of analysis on tumours from principal modules components with multiple patients clinical Guanming Wu parameters Module 3 bioinformatics.ca Module-Based Signatures of Breast Cancer Survival • Nejm: van de Vijver et al 2002 – 295 Samples, ~12,000 genes – Event: death • GSE4922: Ivshina et al. Cancer Res. 2006 – 249 Samples, ~13,000 genes – Event: recurrence or death Module 3 bioinformatics.ca Building the Network • Built based on the Nejm data set – 27 modules selected based on size cutoff 7 and average correlation cutoff 0.25. • Validated using GSE4922. Module 3 bioinformatics.ca PC Analysis Identifies Module 2 as Explaining Much of Variation in Survival Module 3 bioinformatics.ca Same Signature Predicts Survival in Independent Data Set Module 3 bioinformatics.ca And Three More Data Sets as Well… Module 3 bioinformatics.ca Module 2: Kinetochore + Aurora B Signaling Module 3 bioinformatics.ca Integration of Multiple Data Sets • Experimental samples can be interrogated many ways: – – – – RNA expression Genome/exome sequencing Copy number changes/loss of heterozygosity shRNA knockdown screens • Integrate multiple functional data types using network/pathway relationships? Module 3 bioinformatics.ca PARADIGM Vaske, Benz et al. Bioinformatics 26:i237 2010 Module 3 bioinformatics.ca Vaske, Benz et al. Bioinformatics 26:i237 2010 Module 3 Factor graph: directed graph connecting genes; each gene is activated, inactivated, or unchanged in a single patient. bioinformatics.ca Vaske, Benz et al. Bioinformatics 26:i237 2010 Module 3 bioinformatics.ca PARADIGM: The Bad News • Distributed in source code form only – Requires several third-party math/graph libraries (all open source). – I have not gotten it to compile yet! • No documentation. • No repositories of formatted pathway data. • No examples of converting experimental data into input files. Module 3 bioinformatics.ca Take Home Messages • Pathway/network analysis can provide context to altered gene lists. • Pathway/network analysis differs greatly in complexity , power, and usability: – SIMPLE: Pathway diagram colorization – MODERATE: Reactome FI network extraction – COMPLEX: PARADIGM • This type of analysis is work-in-progress, but promises ability to integrate data across many dimensions. Module 3 bioinformatics.ca URLs KEGG – www.genome.jp/kegg Biocarta – www.biocarta.com WikiPathways – www.wikipathways.org Reactome – www.reactome.org NCI/PID – pid.nci.nih.gov Ingenuity – www.ingenuity.com Pathway Commons – www.pathwaycommons.org/pc/ PARADIGM -- http://sbenz.github.com/Paradigm/ Module 3 bioinformatics.ca URLs BioGrid – www.thebiogrid.org InTact – www.ebi.ac.uk/intact MINT – mint.bio.uniroma2.it iHOP – www.ihop-net.org/UniPub/iHOP PubGene – www.pubgene.org Module 3 bioinformatics.ca We are on a Coffee Break & Networking Session Module 2 bioinformatics.ca