Download Gene ontology and pathways

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Copy-number variation wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Oncogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Epistasis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene therapy wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene nomenclature wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Ridge (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Minimal genome wikipedia , lookup

Genome (book) wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Gene ontology and pathways
Ståle Nygård
[email protected]
Bioinformatics Core Facility,
Oslo University Hospital/University of Oslo
So: here you are
Gene lists
• Long list of
differentially
expressed genes
• Possibly hundreds
of papers
describing the
functions of the
genes
• Misleading names
• Different names in
different organisms
Genes seldomly operate on it's
own
-Genes are by nature not independent.
Biologically related genes will often show
expression changes together
-Trends supported by several genes in a
group gives more power to statistical tests
vs a test for an individual gene
-Need predefined groups of biologically
related genes to help process our list for
systematic changes.
Ontologies
• Gene Ontology (GO)
• Sequence Ontology (SO) (sequence
features)
• Phenotype and Trait Ontology (PATO)
• Taxon (NCBI)
• Anatomy (Penn)
• Disease (ICD9)
• Developmental stage (multiple sources)
Gene Ontology (GO)
• Why Gene Ontology?
– Produce a controlled vocabulary describing
aspects of molecular biology, that can be
applied to all organisms.
– Facilitate communication between people and
organization.
– Improve interoperability between systems.
Goal of GO Consortium
(http://www.geneontology.org/)
• Produce a controlled vocabulary
describing aspects of molecular biology,
that could be applied to all organism.
• Describe gene products using vocabulary
terms (annotation).
• Develop tools:
– to query and modify the vocabularies and
annotations
How does GO work?
What information might we want to capture
about a gene product?
• What does the gene product do?
• Why does it perform these activities?
• Where does it act?
The Gene Ontology (GO)
– Molecular function:
• Gene product at biochemical level.
– Biological process:
• Cellular events to which the gene product
contributes.
– Cellular component:
• Location or complex of gene/protein.
Molecular Function
• activities or “jobs” of a gene product
Insulin binding
Insulin transport activity
Biological Process
• a commonly recognized series of events
cell division
Cellular Component
• where a gene product acts
Content of GO
Molecular Function
Biological Process
Cellular Component
8,731 terms
19,022 terms
2,737 terms
Total
30,490 terms
Obsolete terms:
1434
As of May 2010
GO Annotation
• Association between gene product and
applicable GO terms
• Provided by member databases. Collaborating
databases annotate their gene products (or genes)
with GO terms, providing references and indicating
what kind of evidence is available to support the
annotations.
• Made by manual or automated methods.
• GO Annotation
•
•
•
•
Database object: gene or gene product
GO term ID
Evidence supporting annotation
Reference
– publication or computational method
Overrepresentation of GO terms
• We have a subset of genes
– List of differentially expressed genes
– List of genes that cluster together
• Which biological processes do these
genes take part in?
• Is there an over-representation of the
number of genes belonging to a particular
biological process, compared to what
could be expected?
Gene Ontology Tools
•
•
•
•
•
•
•
eGON (from NTNU, www.genetools.no)
GSEA
DAVID
EASE
TopGO
GOstat
+ many more
Question:
which cellular biological processes occur?
0 2 4
6
8 10 12 14 16 18 20 22 24
hours
human fibroblasts 24 h time course thymidine-block
release
Questions
what is the function of up-regulated genes?
0 2
4
6
8
10
12
14 16
18
20
22
24
hours
what is the function of down-regulated genes?
human fibroblasts 24 h time course thymidine-block
release
173 genes up-regulated 0-4 hours compared to all
genes on the array
Ordered by significance:
146 genes down-regulated 0-4 hours compared to
all genes on the array
homeostasis
lipid transport
cell adhesion
chemotaxis
amino acid metabolism
response to stress
lipid metabolism
0 2
4
6
8
10
12
14 16
18
20
22
24
cell signaling
S-phase
ion transport
apoptosis
hours
cell cycle arrest
apoptosis
human fibroblasts 24 h time course thymidine-block release
Biological pathways
Type of pathways
• Metabolic pathways
– convert raw materials from the environment
into value-added products and recycle or
dispose of intracellular materials
• Signaling pathways
– convert mechanical/chemical stimulus to a cell
into a specific cellular response
• Regulatory pathways
– alter the output of the genetic program
through transcriptional and translational
regulation
• Signaling,
regulatory
and
metabolic
events are
often linked
Signaling
Regulatory
Metabolic
Types of pathway
representations
• Cartoons
– Textbooks
– Biocarta
• Circuit diagrams
– KEGG
– Reactome
– geneRifs
• Computational networks
– SBML models
– Transcription factor
networks
KEGG
• A large collection of signaling, metabolic
and regulatory pathways
• Organised by separate pathways with
hand drawn diagrams
• Academic (freely available)
• The pathways can be used to look for
overrepresentation or enrichment
• Can be used to visually check for pathness or direction
TGF Beta signalling patway
Same pathway in Biocarta
GO vs. Pathways
• Overview
• Can handle a large
number of genes
• Many genes
annotated
• Every gene
considered on its own
• Detail view
• Focused sets of
genes
• Scattered data
sources
• Focuses on
interactions between
genes
Network construction
• Information about established pathways
(e.g. in KEGG) is (not at all) complete
• Pathways interact and depend on context
• An alternative approach to using
established pathways is to construct
networks from the data.
Network construction
• Networks can be inferred inferred from
– correlation in the data (recall gene clustering)
and/or
– interaction databases:
• Protein-protein interactions: BioGRID,
IntACT, DIP,HPRD ++
• Transcription factor data bases:
TRANSFAC, JASPAR ++
• Literature: PubGENE
Network construction: case
study
WT AB
CXCR5 KO AB
Mice with the chemokine CXCR5 receptor
knocked out develop dialated hypertrophy
after banding of the aorta.
Microarray study
WT SHAM
(n=3)
KO SHAM
(n=3)
WT AB
(n=4)
KO AB
(n=4)
Aim of study: Find the molecular mecanism
behind the altered phenotype of the heart.
Network construction using prior
knowledge
This method constructs a network of
interacting genes based on literature
reported interactions, protein-protein
interactions and correlations in the data.
Results
FMOD - fibromodulin
…may regulate TGF-beta
activities by sequestering TGFbeta into the extracellular matrix
Fn1-Fibronectin 1
Extracellular matrix
glycoprotein that
binds to membrane
-spanning
receptor proteins
called integrins.
CXCL13
B lymphocyte
chemoattractant
Tgfb2 - transforming
growth factor, beta 2
Extracellular
glycosylated protein.
Thbs4thrombospondin
4
Col14a1Collagen, type XIV,
alpha 1
Lox – lysil oxidase
Extracellular copper
enzyme that initiates
the crosslinking of
collagens and elastin.
Thbs1- thrombospondin 1
Adhesive glycoprotein that
mediates cell-to-cell and
cell-to-matrix interactions.
Spp1- secreted
phosphoprotein 1
Cytokine. Probably
important to cellmatrix interaction
KO AB vs KO SHAM
The method finds a cluster of differentially expressed
extracellular matrix locallized genes
Conclusion
• GO is the world map of molecular biology
• Pathways provide more detailed
information
• Network construction using interaction
databases can reveal information beyond
classical pathways
Questions?