* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene ontology and pathways
Copy-number variation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Oncogenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Essential gene wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene nomenclature wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Ridge (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene ontology and pathways Ståle Nygård [email protected] Bioinformatics Core Facility, Oslo University Hospital/University of Oslo So: here you are Gene lists • Long list of differentially expressed genes • Possibly hundreds of papers describing the functions of the genes • Misleading names • Different names in different organisms Genes seldomly operate on it's own -Genes are by nature not independent. Biologically related genes will often show expression changes together -Trends supported by several genes in a group gives more power to statistical tests vs a test for an individual gene -Need predefined groups of biologically related genes to help process our list for systematic changes. Ontologies • Gene Ontology (GO) • Sequence Ontology (SO) (sequence features) • Phenotype and Trait Ontology (PATO) • Taxon (NCBI) • Anatomy (Penn) • Disease (ICD9) • Developmental stage (multiple sources) Gene Ontology (GO) • Why Gene Ontology? – Produce a controlled vocabulary describing aspects of molecular biology, that can be applied to all organisms. – Facilitate communication between people and organization. – Improve interoperability between systems. Goal of GO Consortium (http://www.geneontology.org/) • Produce a controlled vocabulary describing aspects of molecular biology, that could be applied to all organism. • Describe gene products using vocabulary terms (annotation). • Develop tools: – to query and modify the vocabularies and annotations How does GO work? What information might we want to capture about a gene product? • What does the gene product do? • Why does it perform these activities? • Where does it act? The Gene Ontology (GO) – Molecular function: • Gene product at biochemical level. – Biological process: • Cellular events to which the gene product contributes. – Cellular component: • Location or complex of gene/protein. Molecular Function • activities or “jobs” of a gene product Insulin binding Insulin transport activity Biological Process • a commonly recognized series of events cell division Cellular Component • where a gene product acts Content of GO Molecular Function Biological Process Cellular Component 8,731 terms 19,022 terms 2,737 terms Total 30,490 terms Obsolete terms: 1434 As of May 2010 GO Annotation • Association between gene product and applicable GO terms • Provided by member databases. Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations. • Made by manual or automated methods. • GO Annotation • • • • Database object: gene or gene product GO term ID Evidence supporting annotation Reference – publication or computational method Overrepresentation of GO terms • We have a subset of genes – List of differentially expressed genes – List of genes that cluster together • Which biological processes do these genes take part in? • Is there an over-representation of the number of genes belonging to a particular biological process, compared to what could be expected? Gene Ontology Tools • • • • • • • eGON (from NTNU, www.genetools.no) GSEA DAVID EASE TopGO GOstat + many more Question: which cellular biological processes occur? 0 2 4 6 8 10 12 14 16 18 20 22 24 hours human fibroblasts 24 h time course thymidine-block release Questions what is the function of up-regulated genes? 0 2 4 6 8 10 12 14 16 18 20 22 24 hours what is the function of down-regulated genes? human fibroblasts 24 h time course thymidine-block release 173 genes up-regulated 0-4 hours compared to all genes on the array Ordered by significance: 146 genes down-regulated 0-4 hours compared to all genes on the array homeostasis lipid transport cell adhesion chemotaxis amino acid metabolism response to stress lipid metabolism 0 2 4 6 8 10 12 14 16 18 20 22 24 cell signaling S-phase ion transport apoptosis hours cell cycle arrest apoptosis human fibroblasts 24 h time course thymidine-block release Biological pathways Type of pathways • Metabolic pathways – convert raw materials from the environment into value-added products and recycle or dispose of intracellular materials • Signaling pathways – convert mechanical/chemical stimulus to a cell into a specific cellular response • Regulatory pathways – alter the output of the genetic program through transcriptional and translational regulation • Signaling, regulatory and metabolic events are often linked Signaling Regulatory Metabolic Types of pathway representations • Cartoons – Textbooks – Biocarta • Circuit diagrams – KEGG – Reactome – geneRifs • Computational networks – SBML models – Transcription factor networks KEGG • A large collection of signaling, metabolic and regulatory pathways • Organised by separate pathways with hand drawn diagrams • Academic (freely available) • The pathways can be used to look for overrepresentation or enrichment • Can be used to visually check for pathness or direction TGF Beta signalling patway Same pathway in Biocarta GO vs. Pathways • Overview • Can handle a large number of genes • Many genes annotated • Every gene considered on its own • Detail view • Focused sets of genes • Scattered data sources • Focuses on interactions between genes Network construction • Information about established pathways (e.g. in KEGG) is (not at all) complete • Pathways interact and depend on context • An alternative approach to using established pathways is to construct networks from the data. Network construction • Networks can be inferred inferred from – correlation in the data (recall gene clustering) and/or – interaction databases: • Protein-protein interactions: BioGRID, IntACT, DIP,HPRD ++ • Transcription factor data bases: TRANSFAC, JASPAR ++ • Literature: PubGENE Network construction: case study WT AB CXCR5 KO AB Mice with the chemokine CXCR5 receptor knocked out develop dialated hypertrophy after banding of the aorta. Microarray study WT SHAM (n=3) KO SHAM (n=3) WT AB (n=4) KO AB (n=4) Aim of study: Find the molecular mecanism behind the altered phenotype of the heart. Network construction using prior knowledge This method constructs a network of interacting genes based on literature reported interactions, protein-protein interactions and correlations in the data. Results FMOD - fibromodulin …may regulate TGF-beta activities by sequestering TGFbeta into the extracellular matrix Fn1-Fibronectin 1 Extracellular matrix glycoprotein that binds to membrane -spanning receptor proteins called integrins. CXCL13 B lymphocyte chemoattractant Tgfb2 - transforming growth factor, beta 2 Extracellular glycosylated protein. Thbs4thrombospondin 4 Col14a1Collagen, type XIV, alpha 1 Lox – lysil oxidase Extracellular copper enzyme that initiates the crosslinking of collagens and elastin. Thbs1- thrombospondin 1 Adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. Spp1- secreted phosphoprotein 1 Cytokine. Probably important to cellmatrix interaction KO AB vs KO SHAM The method finds a cluster of differentially expressed extracellular matrix locallized genes Conclusion • GO is the world map of molecular biology • Pathways provide more detailed information • Network construction using interaction databases can reveal information beyond classical pathways Questions?