Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module 6 Gene Function Prediction Quaid Morris http://morrislab.med.utoronto.ca Outline • Functional interaction networks • Concepts in gene function prediction: – Guilt-by-association – Gene recommender systems • • • • • Scoring interactions by guilt-by-association GeneMANIA GeneMANIA demo Explanation of network weighting schemes STRING Module 6 bioinformatics.ca Using genome-wide data in the lab Protein domain similarity network Protein-protein interaction data Genetic interaction data ?!? Microarray expression data Module 6 bioinformatics.ca Two types of function prediction • “What does my gene do?” – Goal: determine a gene’s function based on who it interacts with: “guilt-by-association” • “Give me more genes like these” – e.g. find more genes in the Wnt signaling pathway, find more kinases, find more members of a protein complex Module 6 bioinformatics.ca Guilt-by-association principle Microarray expression data Conditions Co-expression network Cell cycle CDC3 CLB4 Genes CDC16 UNK1 RPT1 RPN3 RPT6 Eisen et al (PNAS 1998) Module 6 UNK2 Protein degradation A useful reference: Fraser AG, Marcotte EM - A probabilistic view of gene function - Nat Genet. 2004 Jun;36(6):559-64 bioinformatics.ca “What does my gene do?” 6/12/13 SLM4GeneMANIA YGR151C Created on: 12 June 2 Last database update: Application version: 3. Output Input Report of GeneMANIA search BEM1 Network and profile data CDC42 RGA1 Network image NCS6 RDI1 PXL1 SSK2 IQG1 RSR1 BZZ1 HUR1 SLM4 YGR151C Query list CDC48 Gene recommender system, then enrichment analysis GI CDC42 BEM4 RGA1 BEM1 CLA4 PEA2 RSR1 BZZ1 SNC1 GIC2 SWF1 GIC1 BEM4 Functions legend Networks legend SNC1 small GTPase mediated signal transduction query genes SKM1 Functions legend Co-expression Networks legend Co-localization small GTPase mediated signal transduction Co-expression query genes Co-localization Genetic interactions Genetic interactions Other Other Physical interactions Module 6 Physical interactions bioinformatics .ca Predicted Shared protein domains Recommender Systems • Memphis, Knoxville, Nashville… – Chattanooga, Morristown • Memphis, Alexandria, Cairo… – Luxor, Giza, Aswan Module 6 bioinformatics.ca GNAQ NOSIP NPR2 LYPLA1 NOS2 POR “Give me more genes like these” NOS1 GNAS NOS3 NDOR1 GUCY1B3 MTRR ZDHHC21 GUCY1A2 GUCY1A3 Input Output TYW1 Network and profile data Functions legend Networks legend 6/12/13 muscle contraction GeneMANIA Co-expression Created on: 12 June 2013 07:18:01 cyclic nucleotide metabolic process Last database update: 19 July 2012 20:00:00 Co-localization Application version: 3.1.2 query genes Genetic interactions Pathway Report of GeneMANIA search Physical interactions Shared protein domains Network image PDE4A Search results generated by the GeneMANIA algorithm (genemania.org) GSTO1 Gene recommender system PDE7A PDE4D ACTA1 PDE4B MYL2 PPP1R1B NPR1 www.genemania.org/printNOSIP GNAQ CNN3 NPR2 LYPLA1 CNN2 NOS2 POR CNN1 NOS1 GNAS NOS3 NDOR1 Query list GUCY1B3 MTRR ZDHHC21 MYLK2 TAGLN PLN GUCY1A3 ATP2A3 ATP2A2 GUCY1A2 ARGLU1 TYW1 DGKZ CALD1 LSP1 Functions legend Networks legend muscle contraction Co-expression cyclic nucleotide metabolic process Co-localization query genes Genetic interactions Pathway Physical interactions Shared protein domains Search results generated by the GeneMANIA algorithm (genemania.org) Module 6 www.genemania.org/print bioinformatics.ca 1/1 Demo of GeneMANIA Module 6 bioinformatics.ca GeneMANIA: Selecting networks I Click links to select all, zero or a predefined (default) set of networks Module 6 bioinformatics.ca GeneMANIA: Selecting networks II Click check boxes to select all (or no) networks of that type. Fraction indicates # of networks selected out of total available (for this organism). Module 6 bioinformatics.ca GeneMANIA: Selecting networks III Click on network type to view list of networks (of that type) in right panel Module 6 Click on check box to select (or deselect) network Click on network name to expand entry to get more information on network. HTML link points to Pubmed abstract bioinformatics.ca Query-independent composite networks Cell cycle CDC27 CDC23 Pre-combine networks e.g. by simple addition or by pre-determined weights APC11 UNK1 RAD54 + + Genetic XRS2 DNA repair MRE11 e.g. Tong et al. 2001 UNK2 Co-expression = Co-complexed e.g. Jeong et al 2002 Composite networks: One size doesn’t fit all • Gene function could be a/the: – – – – – – Biological process, Biochemical/molecular function, Subcellular/Cellular localization, Regulatory targets, Temporal expression pattern, Phenotypic effect of deletion. Some networks may be better for some types of gene function than others Module 6 bioinformatics.ca Two rules for network weighting Relevance The network should be relevant to predicting the function of interest • Test: Are the genes in the query list more often connected to one another than to other genes? Redundancy The network should not be redundant with other datasets – particularly a problem for co-expression • Test: Do the two networks share many interactions? • Caveat: Shared interactions also provide more confidence that the interaction is real. Module 6 bioinformatics.ca Solution: Query-specific weights w1 x Cell cycle weights w3 x CDC27 CDC23 APC11 UNK1 RAD54 w2 x + + Genetic Co-complexed e.g.Tong et al. 2001 e.g. Jeong et al 2002 XRS2 DNA repair MRE11 UNK2 Co-expression = 54% 33% 13% Network weighting schemes I By default, GeneMANIA decides between GO-dependent and query-specific weighting scheme based on the size of your list. We recommend using the default scheme in most cases Click radio button to change the network weight scheme Module 6 bioinformatics.ca Network weighting schemes II - GO-based weighting assigns network weights based on how well the networks reproduce patterns of GO co-annotations (“Are genes that interact in the network more likely to have the same annotation?”), - Can choose any of the three hierarchies, - Ignores query list when assigning network weight. Module 6 bioinformatics.ca Network weighting schemes III Can force query list based weighting by selecting this option Module 6 Select these and either all networks or all data types get the same weight bioinformatics.ca Scoring nodes by guilt-by-association Query list: “positive examples” MCA1 CDC48 CPR3 TDH2 Module 6 bioinformatics.ca Scoring nodes by guilt-by-association Query list: “positive examples” MCA1 Score CDC48 high CPR3 TDH2 low Direct neighborhood CDC48 MCA1 CPR3 TDH2 Module 6 Two main algorithms Label propagation CDC48 MCA1 CPR3 TDH2 bioinformatics.ca Node scoring algorithm details • Direct neighbour node score depends on: – Strength of links to query genes – # of query gene neighbors • GeneMANIA Label propagation node score depends on: – Strength of links and # of query gene neighbors – # of shared neighbors with positive examples – Whether or not node is in a cluster of nodes with query gene(s) (i.e. # of shared neighbours with query genes) – Take home: allows indirect links to query genes to impact scores, so often brings up clusters of nodes Module 6 bioinformatics.ca Label propagation example Before Module 6 After bioinformatics.ca Three parts of GeneMANIA: • A large, automatically updated collection of interactions networks. • A query algorithm to find genes and networks that are functionally associated to your query gene list. • An interactive, client-side network browser with extensive link-outs Module 6 bioinformatics.ca GeneMANIA data sources -Gene ID mappings from Ensembl and Ensembl Plant IRefIndex -Network/gene descriptors from Entrez-Gene and Pubmed Interologs + some organism-specific datasets (click around to see what’s available) Module 6 -Gene annotations from Gene Ontology, GOA, and model org. databases bioinformatics.ca Gene identifiers • All unique identifiers within the selected organism: e.g. – – – – – Entrez-Gene ID Gene symbol Ensembl ID Uniprot (primary) also, some synonyms & organism-specific names • We use Ensembl database for gene mappings (but we mirror it once / 3 months, so sometimes we are out of date) Module 6 bioinformatics.ca Current status • Seven organisms: – Human, Mouse, yeast, worm, fly, A Thaliana, Rat • ~1,250 networks (about 50% co-expression, 35% physical interaction) • Web network browser Module 6 bioinformatics.ca Cytoscape plugin http://www.genemania.org/plugin/ QueryRunner Area under the curve -Runs GO function prediction from the command line. -Does cross-validation to assess predictive performance of a set of networks Genetic interaction networks Legend -Can assess “added predictive value of new data” (Michaut et al, in press) Module 6 bioinformatics.ca STRING: http://string-db.org/ Module 6 bioinformatics.ca STRING results Module 6 bioinformatics.ca STRING results Module 6 bioinformatics.ca GeneMANIA vs STRING • STRING (2003-present) – – – – Large organism coverage Protein focused Uses eight pre-computed networks Heavy use of phylogeny to infer functional interactions, also contains text mining derived interactions – Uses “direct interaction” to score nodes – Link weights are “Probability of functional interaction” • GeneMANIA webserver (2010-present) – – – – Covers 6 major model organisms (but can add more with plugin) Gene focused Thousands of networks, weights are not pre-computed, can upload your own network Relies heavily on functional genomic data: so has genetic interactions, phenotypic info, chemical interactions – Allows enrichment analysis – Uses “label propagation” to score nodes Module 6 bioinformatics.ca GeneMANIA future directions • • • • • Other organisms – next: E. Coli, zebrafish Non-coding genes (miRNAs!) Regulatory networks (ChIP, RNA-protein, miRNA-mRNAs) More phenotypic information (OMIM) Orthology mapping for inferring interologs Module 6 bioinformatics.ca GeneMANIA URLs Main site (stable but still fun): http://www.genemania.org Beta site (new and edgy but possibly unreliable): http://beta.genemania.org Module 6 bioinformatics.ca