Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Canadian Bioinformatics Workshops www.bioinformatics.ca In collaboration with Cold Spring Harbor Laboratory & New York Genome Center Module #: Title of Module 3 Module 3 Lab Cytoscape & Enrichment Map Jüri Reimand Network Analysis Workflow Collect genomics data (e.g. mRNA expression) Normalize and score (e.g. compute differential expression) Visualize and Identify interesting pathways and networks Generate gene list Drill down to understand molecular mechanism Learn about underlying cellular mechanism using pathway and network analysis Publish model explaining data • A specific example of this workflow: • Cline, et al. “Integration of biological networks and gene expression data using Cytoscape”, Nature Protocols, 2, 2366-2382 (2007). Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Pathways vs Networks - Detailed, high-confidence consensus - Biochemical reactions - Small-scale, fewer genes - Concentrated from decades of literature - Simplified cellular logic, noisy - Abstractions: directed, undirected - Large-scale, genome-wide - Constructed from omics data integration Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Networks • Represent relationships of biological molecules – Physical, regulatory, genetic, functional interactions • Useful for discovering relationships in large data sets – Better than tables in Excel • Visualize multiple data types together – Discover interesting patterns • Network analysis – Finding sub-networks with certain properties (densely connected, coexpressed, frequently mutated, clinical characteristics) – Finding paths between nodes (or other network “motifs”) – Finding central nodes in network topology (“hub” genes) Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Network basics (i): nodes and edges A simple mapping • one molecule - node • one interaction - edge A more realistic mapping • Cell localization • cell cycle • cell type • taxonomy • Physiologically relevant • Edges - diverse relationships Critical: what do nodes and edges mean? Network basics (i): nodes and edges Node (molecule) A simple mapping • one molecule - node • one interaction - edge A more realistic mapping • Cell localization • cell cycle • cell type • taxonomy • Physiologically relevant • Edges - diverse relationships Critical: what do nodes and edges mean? • • • • • • Gene Protein Transcript Drug MicroRNA … Edge (interaction) • • • • • • Genetic interaction Physical protein interaction Co-expression Signaling interaction Metabolic reaction DNA-binding Directed or undirected? Network Representations Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Network basics (ii) layout and visual attributes Use layout to interpret network Local relationships Guilt-by-association Dense clusters Global relationships Visualise multiple types of data Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Automatic network layout Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Force-directed network layout • Force-directed: nodes repel and edges pull • Good for up to 500 nodes – Bigger networks give hairballs - reduce number of edges • Advice: try force directed first, or hierarchical for tree-like networks • Tips for better looking networks – Manually adjust layout – Load network into a drawing program (e.g. Illustrator) and adjust labels Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca force directed layout 1 http://sydney.edu.au/engineering/it/~aquigley/avi/spring.avi 3 Module 3 Lab: Cytoscape and Enrichment Map 2 4 bioinformatics.ca Dealing with ‘hairballs’: zoom or filter MKK1 MKK2 SLT2 Zoom Focus PKC (Cell Wall Integrity) Wsc1/2/3 WSC2 Mid2 WSC3 MID2 SLG1 SWI4 SWI6 RLM1 Bni1 Polarity Synthetic Lethal Transcription Factor Regulation Protein-Protein Interaction RHO1 Rho1 PKC Cell Wall Integrity Pkc1 PKC1 BNI1 Bck1 BCK1 MKK1 Mkk1/2 MKK2 SLT2 Up Regulated Gene Expression Slt2 Down Regulated Gene Expression Swi4/6 Rlm1 SWI4 SWI6 RLM1 Attribute color Arrow shape Visual Features • Node and edge attributes – Represent properties of genes, interactions – Text (string), integer, float, Boolean, list Edge shape Node shape • Visual attributes – Node, edge visual properties – Colour, shape, size, borders, opacity... Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca What Have We Learned? • • • • • • Networks help discover relationships in large data sets Networks help integrate several datasets or types Important to understand meaning of nodes and edges Avoid hairballs by focusing analysis Automatic layout is required to visualize networks Visual attributes enable multiple types of data to be shown at once – useful to see their relationships Module 37Lab: Cytoscape and Enrichment Map Part II bioinformatics.ca Cytoscape - Network Visualization Cytoscape is • an open source software platform • for visualizing complex networks and integrating these with any type of attribute data. • a lot of apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web. Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Manipulate Networks Automatic Layout Filter/Query Interaction Database Search The Cytoscape App Store http://apps.cytoscape.org Pathway analysis Gene expression analysis Complex detection Literature mining Network motif search Pathway comparison Introduction to Cytoscape (3.1.0) save your session save image Results panel Control panel Table panel Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Navigate through the network (3.1.0) NETWORK move the blue square to navigate through the network Module 37Lab: Cytoscape and Enrichment Map Part II bioinformatics.ca Try different Cytoscape layouts (3.1.0) Circular Layout Module 37Lab: Cytoscape and Enrichment Map Part II bioinformatics.ca yFiles Circular Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca yFiles Organic Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Change visual Features (3.1.0) STYLE Colors Shapes Sizes Brushes Transparency … Module 37Lab: Cytoscape and Enrichment Map Part II bioinformatics.ca Visual Style Load “Your Favorite Network” File > Import > Network > File … Visual Style Load “Your Favorite Expression” Dataset File > Import > Table > File … Visual Style Map expression values to node colours using a continuous mapper Visual Style Expression data mapped to node colours Fine-tuning network layout • Move, zoom/pan, rotate, scale, align Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Create Subnetwork Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Create Subnetwork Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Network Filtering Select nodes with at least 20 interactions Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Interaction Database Search Query the BioGrid database for interactions of the cancer gene KRAS What have we learned • Cytoscape is a useful, free software tool for network visualisation and analysis • Provides basic network features – Automated layouts – Mapping of genomic attributes to visual attributes • Apps are available to extend functionality to diverse analyses Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Enrichment Map • A Cytoscape app to visualize and interpret results of pathway enrichment analysis • Enriched pathways are visualized as networks • Edges connect pathways with many shared genes pathway network Module 3 Lab: Cytoscape and Enrichment Map node (pathway) edge: # of overlapping genes bioinformatics.ca GO.id GO:0042330 GO:0006935 GO:0002460 GO:0002250 GO:0002443 GO:0019724 GO:0030099 GO:0002252 GO:0050764 GO:0050766 GO:0002449 GO:0019838 GO:0051258 GO:0005789 GO:0016064 GO:0007507 GO:0009617 GO:0030100 GO:0002526 GO:0045807 GO:0002274 GO:0008652 GO:0050727 GO:0002253 GO:0002684 GO:0050778 GO:0019882 GO:0002682 GO:0050776 GO:0043086 GO:0006909 GO:0002573 GO:0006959 GO:0046649 GO:0030595 GO:0006469 GO:0051348 GO:0007179 GO:0005520 GO:0042110 GO:0002455 GO:0005830 GO.name p.value covercover.rat Deg.mdn Deg.iqr taxis 2.18E-06 23 0.056930693 54.94499375 9.139238998 chemotaxis 2.18E-06 23 0.060209424 54.94499375 9.139238998 adaptive immune response based on somatic recombination 7.10E-05 25 0.111111111 57.32306955 16.97054864 adaptive immune response 7.10E-05 25 0.111111111 57.32306955 16.97054864 leukocyte mediated immunity 0.000419328 23 0.097046414 58.27890582 15.58333739 B cell mediated immunity 0.000683758 20 0.114285714 57.84161096 15.03496347 myeloid cell differentiation 0.000691589 24 0.089219331 62.22171598 10.35284833 immune effector process 0.000775626 31 0.090116279 58.27890582 23.86214773 regulation of phagocytosis 0.000792138 8 0.2 53.54786293 5.742849971 positive regulation of phagocytosis 0.000792138 8 0.216216216 53.54786293 5.742849971 lymphocyte mediated immunity 0.00087216 22 0.101851852 57.84161096 16.13171132 growth factor binding 0.000913285 15 0.068181818 83.0405088 10.58734852 protein polymerization 0.00108876 17 0.080952381 57.97543252 17.31639968 endoplasmic reticulum membrane 0.001178198 18 0.036072144 64.02284752 12.05209158 immunoglobulin mediated immune response 0.001444464 19 0.113095238 58.27890582 15.58333739 heart development 0.001991562 26 0.052313883 84.02538284 18.60761304 response to bacterium 0.002552999 10 0.027173913 52.75249873 23.23104637 regulation of endocytosis 0.002658555 11 0.099099099 56.38041132 16.02486889 acute inflammatory response 0.002660742 24 0.103004292 57.80098769 24.94311116 positive regulation of endocytosis 0.002903401 9 0.147540984 54.94499375 6.769909171 myeloid leukocyte activation 0.002969661 7 0.077777778 54.94499375 16.07042339 amino acid biosynthetic process 0.003502921 7 0.017241379 45.19797271 31.18248579 regulation of inflammatory response 0.004999055 7 0.084337349 54.94499375 7.737346076 activation of immune response 0.00500146 23 0.116161616 60.29679989 18.41103376 positive regulation of immune system process 0.006581245 27 0.111570248 60.29679989 22.05051447 positive regulation of immune response 0.006581245 27 0.113924051 60.29679989 22.05051447 antigen processing and presentation 0.007244488 7 0.029661017 54.94499375 16.58797889 regulation of immune system process 0.007252134 29 0.099656357 61.05645008 22.65935206 regulation of immune response 0.007252134 29 0.102112676 61.05645008 22.65935206 negative regulation of enzyme activity 0.008017022 9 0.040723982 53.28031076 17.48904224 phagocytosis 0.008106069 10 0.080645161 55.66270253 12.47536747 myeloid leukocyte differentiation 0.008174948 10 0.092592593 62.86577216 9.401887596 humoral immune response 0.008396095 16 0.044568245 55.05654091 18.94209565 lymphocyte activation 0.009044401 29 0.059917355 61.92213317 21.03553355 leukocyte chemotaxis 0.009707319 7 0.101449275 56.33116709 6.945510559 negative regulation of protein kinase activity 0.010782155 7 0.046357616 52.22863516 12.58524145 negative regulation of transferase activity 0.010782155 7 0.04516129 52.22863516 12.58524145 transforming growth factor beta receptor signaling pathw 0.012630825 13 0.071038251 83.49440788 12.63256309 insulin-like growth factor binding 0.012950071 9 0.097826087 81.41963394 7.528247832 T cell activation 0.013410548 20 0.064516129 59.77891783 26.06174863 humoral immune response mediated by circulating immunogl 0.016780163 10 0.125 54.70766244 14.2572143 cytosolic ribosome (sensu Eukaryota) 0.016907351 8 0.01843318 61.68933284 7.814673781 Ad Zoom of CNS-Development Cell projection Neuron migration organization Cell morphogenesis Cerebral cortex cell migration Cell Motility (stricter cluster) Neurite development CNS neuron differentiation Brain development Axonogenesis CNS development Projection neuron axonogenesis N re of Creating an Enrichment Map with data from g:Profiler 1. Go to g:Profiler website - http://biit.cs.ut.ee/gprofiler/ . 2. Select and copy all genes in the tutorial file MCF7_24hr_topgenes.txt in the Query box 3. In Options, check Significant only, No electronic GO annotations 4. Set the Output type to Generic Enrichment Map (TAB) 5. Show advanced options 6. Set Max size of functional category to 1000 and Min size to 5. 7. Set Min size of gene list and functional category overlap Q&T to 2. 8. Set Significance threshold to Benjamini-Hochberg FDR . 9. Choose GO biological process, molecular function, KEGG and Reactome from the color legend. 10. Download g:Profiler data as gmt: name 11. Click on g:Profile! to run the analysis 12. Download the result file: Download data in Generic Enrichment Map (GEM) format Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca g:Profiler files for Enrichment Map • Two files are required: A. A. Tabular text file with enriched processes&pathways B. GMT file with all processes&pathways and associated genes B. Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca TAB file – Enrichment Map Process ID Process name P-value, FDR Phenotype positive 1; negative -1 Genes common to input list and process/pathway Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca GMT file – Enrichment Map http://biit.cs.ut.ee/gprofiler/gmt/gprofiler_hsapiens.NAME.gmt.zip Process ID Process name Module 3 Lab: Cytoscape and Enrichment Map Gene list bioinformatics.ca Getting the Enrichment Map app A. Cytoscape App Store http://apps.cytoscape.org/apps/enrichmentmap B. In Cytoscape software Apps > App manager > Search > Install Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Constructing an Enrichment Map A. Set input file format to “Generic” A. B. C. (format of g:Profiler results) B. Select GMT file from file system (this file contains gene lists of pathways) C. Select enrichments file from file system D. D. Set equal P-value and Q-value cutoffs (because g:Profiler only provides corrected P-values=Q-values) Click Build ! Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Enrichment Map – a network of pathways Nodes – pathways, processes, functions Edges – between pathways with many shared genes Node size – genes in pathway Node color – enrichment strength Edge weight – genes shared Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Enrichment Map – a network of pathways What are major functional themes? Apoptosis Proliferation Protein degradation Mitochondrial processes Tissue development Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Edge definition determines granularity • Edges between pathways defined by % of common genes • Increasing edge stringency reveals finer granularity of functional themes Mitochondrial processes Module 3 Lab: Cytoscape and Enrichment Map Protein degradation Apoptosis bioinformatics.ca Mapping attributes to network Node labels mapped to pathway IDs by default Pathway IDs Module 3 Lab: Cytoscape and Enrichment Map Pathway names bioinformatics.ca Compare two pathway enrichment analyses g:Profiler analyses of MCF7_12hr_topgenes.txt and MCF7_24hr_topgenes.txt 1st dataset maps to node fillings 2nd dataset maps to node edges Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Mapping further gene sets to Enrichment Map E.g. which processes associate to known cancer genes? CANCER_GENES B. Select GMT file from file system (this file contains gene lists used in original Enrichment Map) B. Select another GMT file from file system (this file contains gene lists mapped on top of original Enrichment Map) Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Post-Analysis: New edges link further gene lists to existing pathways Purple edges – pathway genes in this analysis are significantly enriched in known cancer genes Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca What have we learned? • Enrichment map uses network visualization to summarize results of pathway enrichment analysis • Edges connect pathways with shared genes, and edge stringency determines granularity of maps • Two sets of pathways can be compared • Post-analysis links further gene sets to existing map Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Tips for publishable Enrichment Maps • Manually curate clusters of connected pathways as “functional themes” – e.g. by picking most general process/pathway in the group – Check genes involved • Review small groups and singleton nodes – Many are probably part of larger groups – safe to remove – Some interesting singletons may be the only representatives! • Assign further visual attributes using your omics data • Export as PDF and use graphics software (AI) for finalizing • Sometimes less is more Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Cytoscape Tips & Tricks • Network views – When you open a large network, no view is shown by default – To improve interactive performance, Cytoscape has the concept of “Levels of Detail” • Some visual attributes will only be apparent when you zoom in • The levels for various attributes can be changed in the preferences • To see what things will look like at full detail: – ViewShow Graphics Details Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Cytoscape Tips & Tricks • Sessions – Sessions save pretty much everything: • • • • Networks Properties Visual styles Screen sizes – Saving a session on a large screen may require some resizing when opened on your laptop Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Cytoscape Tips & Tricks • Memory – – – – – Cytoscape uses lots of it Doesn’t like to let go of it An occasional restart when working with large networks helps Destroy views when you don’t need them Java doesn’t have a good way to get the memory right at start • Since version 2.7, Cytoscape does a much better job at “guessing” good default memory sizes than previous versions Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca Cytoscape Tips & Tricks • CytoscapeConfiguration directory – In your user/home directory – Your defaults and any apps downloaded from the app store will go here – Sometimes, if things get really messed up, deleting (or renaming) this directory can give you a “clean slate” Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca We are on a Coffee Break & Networking Session Module 3 Lab: Cytoscape and Enrichment Map bioinformatics.ca