Download PPTX - Bioinformatics.ca

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Canadian Bioinformatics Workshops
www.bioinformatics.ca
Module #: Title of Module
2
Module 3
Pathway and Network Analysis
Lincoln Stein
Classes of Gene Set Analysis
DAVID
GSEA
Reactome FI network
PARADIGM
Khatri et al. PLOS Comp Bio. 8:1 2012
Module 3
bioinformatics.ca
Limitations of Gene Set Enrichment
Analysis
• Many possible gene sets – diseases, molecular
function, biological process, cellular compartment,
pathways...
• Gene sets are heavily overlapping; need to sort
through lists of enriched gene sets!
• “Bags of genes” obscure regulatory relationships
among them.
Module 3
bioinformatics.ca
Pathway Databases
• Advantages:
–
–
–
–
Usually curated.
Biochemical view of biological processes.
Cause and effect captured.
Human-interpretable visualizations.
• Disadvantages:
– Sparse coverage of genome.
– Different databases disagree on boundaries of pathways.
Module 3
bioinformatics.ca
KEGG
Module 3
bioinformatics.ca
Reactome
• Hand-curated pathways in human.
• Rigorous curation standards – every reaction traceable to
primary literature.
• Automatically-projected pathways to non-human species.
• 22 species; 1112 human pathways; 5078 proteins.
• Features:
–
–
–
–
Google-map style reaction diagrams with overlays;
Find pathways containing your gene list;
Calculate gene overrepresentation in pathways;
Find corresponding pathways in other species.
• Open access.
Module 3
bioinformatics.ca
Reactome
Module 3
bioinformatics.ca
Pathway Commons
Module 3
bioinformatics.ca
Pathway Colorization
• Main feature offered by all pathway databases.
• Upload a gene list
• Database calculates an enrichment score on each
pathway and displays ranked list.
• Browse into pathways of interest; download
colorized pictures.
Module 3
bioinformatics.ca
Example from Reactome
Module 3
bioinformatics.ca
Example from Reactome
Module 3
bioinformatics.ca
Module 3
bioinformatics.ca
Networks
• Pathways capture only the “well understood” portion
of biology.
• Networks cover less well understood relationships:
–
–
–
–
–
Module 3
Genetic interactions
Physical interaction
Coexpression
GO term sharing
Adjacency in pathways
bioinformatics.ca
Module 3
bioinformatics.ca
Module 3
bioinformatics.ca
Module 3
bioinformatics.ca
Module 3
bioinformatics.ca
Module 3
bioinformatics.ca
Network Databases
• Can be built automatically or via curation.
• Popular sources of curated networks:
– BioGRID – Curated interactions from literature; 529,000
genes, 167,000 interactions.
– InTact – Curated interactions from literature; 60,000 genes,
203,000 interactions.
– MINT – Curated interactions from literature; 31,000 genes,
83,000 interactions.
Module 3
bioinformatics.ca
Uncurated Interaction Sources
• Text mining approaches
– Computationally extract gene relationships from text, such
as PubMed abstracts.
– Much faster than hand curation.
– Not perfect:
• Problems recognizing gene names. Is hedgehog a gene or a
species?
• Natural language processing is difficult.
– Popular resources:
• iHOP
• PubGene
Module 3
bioinformatics.ca
Uncurated Interaction Sources
• Experimental techniques
– Yeast 2 hybrid protein interactions.
– Protein complex pulldowns/mass spec.
– Genetic screens, such as synthetic lethals,
enhancer/suppressor screens.
– NOT perfect
• Y2H interactions have taken proteins out of natural context;
physical interaction != biological interaction.
• Protein complex pulldowns plagued by “sticky” proteins such as
actin.
• Genetic screens highly sensitive to genetic background (“network
effects”).
Module 3
bioinformatics.ca
Integrative Approaches
• Combine multiple sources of evidence to increase
accuracy.
• Simple example:
– “Party hubs” are Y2H interactions that have been filtered
for those partners that share the same temporal-spatial
location.
• Complex example:
– Combine multiple sources of curated and uncurated
evidence.
Module 3
bioinformatics.ca
Example: Reactome FI Network
Curated Human Data – Version 35.
5078 proteins
4166 reactions
3870 complexes
1112 pathways
Only ~25% of genome!
Goal: add a “corona” of
uncurated interaction data
around scaffold of curated
pathway data.
Expanding Reactome’s Coverage
Curated Pathways
Uncurated Information
human PPI
PPI inferred from fly,
worm & yeast
PPI from text mining
GeneWays
Gene co-expression
CellMap
TRED
GO annotation on
biological processes
Protein domaindomain interactions
Naïve Bayes
Classifier
Annotated Functional
Interactions
Predicted Functional
Interactions
Wu et al. (2010) Genome Biology
Integrated Functional Interaction (FI) Network
•10,956 proteins
(9,542 genes).
•209,988 FIs.
•~50% coverage of
genome.
•False (+) rate < 1%
•False (-) rate ~80%
5% of network
shown here
Active Network Extraction & Analysis
Reactome Functional Interaction network
Add Linker
genes
Extract mutated, overexpressed,
undexpressed, expanded/deleted
genes
Disease subnetwork
Apply community
clustering algorithms
Disease “modules”
Hypothesis generation
Sample classification
Disease gene prediction
Module 3
bioinformatics.ca
Pancreatic Cancer Module Map (43 Cases)
Transcription & translation
p53, SMAD, TGFβ,
TNF signaling
Non-silent mutations
• blue – in primary tumour only
• green – in xenograft only
• red – in primary & xenograft
Cell cycle
Zinc fingers
Transcription
Wnt & Cadherin
signaling
Heterotrimeric
G-protein signaling
Ca2+ Signaling
Hedgehog
signaling
Rho GTPase
signaling
KRAS, MAPK signaling
Christina
Module
3 Yung
Integrin signaling
bioinformatics.ca
Glioblastoma stem cells (GSC)
Irina Kalatskaya in collaboration with Peter Dirks lab (SickKids)
Glioblastoma Stem Cell Network
complement
HOX
TP53/RB1/JUN/SP1
GLI2
collagen
Beta-catenin
IL-1
FGF
Ribosomal proteins
BMP
GPCR
Small Rho proteins
CREB1
Module 3
bioinformatics.ca
Network Classification of Disease
• Traditional: Associate active genes with clinical
behavior to create gene-based prognostic signatures.
• Limitations: Too many genes reduces statistical
power
• New idea: Look for associations between active
modules and clinical behavior.
Module 3
bioinformatics.ca
Using the Reactome FI Network to Find a Breast
Cancer Survival Signature
Principal
Expression
Disease Module Map
component
Correlate
Analysis of
analysis on
tumours from principal
modules
components
with
multiple patients
clinical
Guanming Wu
parameters
Module 3
bioinformatics.ca
Module-Based Signatures of Breast
Cancer Survival
• Nejm: van de Vijver et al 2002
– 295 Samples, ~12,000 genes
– Event: death
• GSE4922: Ivshina et al. Cancer Res. 2006
– 249 Samples, ~13,000 genes
– Event: recurrence or death
Module 3
bioinformatics.ca
Building the Network
• Built based on the Nejm data set
– 27 modules selected based on size cutoff 7 and average
correlation cutoff 0.25.
• Validated using GSE4922.
Module 3
bioinformatics.ca
PC Analysis Identifies Module 2 as Explaining Much of
Variation in Survival
Module 3
bioinformatics.ca
Same Signature Predicts Survival in
Independent Data Set
Module 3
bioinformatics.ca
And Three More Data Sets as Well…
Module 3
bioinformatics.ca
Module 2: Kinetochore + Aurora B Signaling
Module 3
bioinformatics.ca
Integration of Multiple Data Sets
• Experimental samples can be interrogated many ways:
–
–
–
–
RNA expression
Genome/exome sequencing
Copy number changes/loss of heterozygosity
shRNA knockdown screens
• Integrate multiple functional data types using
network/pathway relationships?
Module 3
bioinformatics.ca
PARADIGM
Vaske, Benz et al. Bioinformatics 26:i237 2010
Module 3
bioinformatics.ca
Vaske, Benz et al. Bioinformatics 26:i237 2010
Module 3
Factor graph: directed graph
connecting genes; each gene is
activated, inactivated, or
unchanged in a single patient.
bioinformatics.ca
Vaske, Benz et al. Bioinformatics 26:i237 2010
Module 3
bioinformatics.ca
PARADIGM: The Bad News
• Distributed in source code form only
– Requires several third-party math/graph libraries (all open
source).
– I have not gotten it to compile yet!
• No documentation.
• No repositories of formatted pathway data.
• No examples of converting experimental data into input
files.
Module 3
bioinformatics.ca
Take Home Messages
• Pathway/network analysis can provide context to altered
gene lists.
• Pathway/network analysis differs greatly in complexity ,
power, and usability:
– SIMPLE: Pathway diagram colorization
– MODERATE: Reactome FI network extraction
– COMPLEX: PARADIGM
• This type of analysis is work-in-progress, but promises
ability to integrate data across many dimensions.
Module 3
bioinformatics.ca
URLs
KEGG – www.genome.jp/kegg
Biocarta – www.biocarta.com
WikiPathways – www.wikipathways.org
Reactome – www.reactome.org
NCI/PID – pid.nci.nih.gov
Ingenuity – www.ingenuity.com
Pathway Commons – www.pathwaycommons.org/pc/
PARADIGM -- http://sbenz.github.com/Paradigm/
Module 3
bioinformatics.ca
URLs
BioGrid – www.thebiogrid.org
InTact – www.ebi.ac.uk/intact
MINT – mint.bio.uniroma2.it
iHOP – www.ihop-net.org/UniPub/iHOP
PubGene – www.pubgene.org
Module 3
bioinformatics.ca
We are on a Coffee Break &
Networking Session
Module 2
bioinformatics.ca