Download Gene Set Enrichment Analysis presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metagenomics wikipedia , lookup

Epistasis wikipedia , lookup

Transposable element wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Oncogenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Long non-coding RNA wikipedia , lookup

NEDD9 wikipedia , lookup

X-inactivation wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene desert wikipedia , lookup

Minimal genome wikipedia , lookup

The Selfish Gene wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Gene Set Enrichment Analysis
(Chapters 13-14)
Motivation
•
Differential expression of individual probes
focuses on genes with extreme changes
–
adjusting for multiple testing is often conservative
and does not change the rank of the DE genes
–
introduces an artificial distinction – differentially
expressed or not
–
genes with small to moderate changes will not be
detected
Motivation
•
Gene set enrichment analyses focuses on sets
of genes whose expression levels are
coordinately changed
•
Gene sets defined a priori (KEGG,GO,MSigDB)
•
Original idea applied to expression data
(Mootha et al. 2003, Nat Genet 34, 267-273)
•
Help biological interpretation
•
Greater power to detect smaller changes
Gene Set Enrichment Analysis
1. Identify a priori biologically interesting gene sets
2. Pre-process and quality assess expression data as
usual
3. Non-specific filtering – remove probes that cannot
possibly be interesting
4. Combine and assess “signals” from several probes
–
–
GSEA approach
HyperG approach
A priori gene sets
•
KEGG, Gene Ontology, MsigDb, protein interaction
partners, chromosome bands…
•
There are many possible gene sets – biologically
motivated
A priori gene sets
•
A gene set is merely a grouping of genes
•
Groups do not need to be exhaustive or disjoint
•
Grouping do not need to be completely right
•
GSEA and hyperG approaches rely on averaging
to help adjust for mistakes
Non-specific filtering
•
Exclude genes that cannot be interesting
–
–
–
•
Annotation is linked to entrez gene IDs
–
–
•
limited variation across all samples
no annotation
pathways with only few members
several probe IDs linked to same Entrez gene ID
need some kind of filtering (random, most significant)
Must not use criteria to be used in analysis, e.g.,
must not filter on expression in biological pathway
of interest
Alternative Statistical Approaches
HyperG approach: Is a functional
gene set (eg. a GO term, KEGG
pathway etc.) overrepresented
among the DE genes?
GSEA approach: Is a
functional gene set
“differentially expressed”?
GSEA approach
GSEA approach
•
Compute a test statistic for each probe
•
Idea is to calculate a statistic that meaningfully
contrast expression levels between groups e.g. For t-statistic for each probe
•
Test statistic should be scale- and sample-size
independent
GSEA approach
•
Calculate appropriate summary statistic of the test
statistic in each gene set
•
Assess the significance of the summary statistic
GSEA approach based on t-stat
•
t-statistics follows tdistribution
•
Sum of t-statistic is
approximately Normal
•
Sum standardized by square
root of the number of genes
|K| in the set is approximately
Normal with mean 0 and
variance 1.
GSEA approach based on t-stat
•
Strong assumptions, eg. about independence
of t-statistics and normality of zK
•
Assess significance of summary statistic using
permutation
–
–
permute sample labels
permute gene labels
Array1
Array2
Array3
Array4
Array5
…
Gene1
0.46
0.30
0.55
3.4
2.1
…
Gene2
-1.3
0.01
1.2
2.3
0.4
…
Gene3
0.23
-0.88
2.3
3.3
0.7
…
Gene4
1.87
0.66
-3.4
-1.4
0.88
…
Gene5
…
…
…
…
GSEA - Overlapping Gene Sets
Fig 13.3
•
Number of genes in common/ number of
genes in smaller gene set
•
Test effect of overlapping gene sets using
linear models
Exercises GSEA approach
•
Gene Set Enrichment Analysis using
KEGG: exercises 13.1-13.5
•
Determining significance of summary test
statistics using permutations: exercises
13.6-13.7
•
Identifying and assessing the effects of
overlapping gene sets: exercises 13.1313.14
hyperG approach
hyperG approach
•
Compute a test statistic for each probe
•
Idea is to calculate a statistic that meaningfully
contrast expression levels between groups e.g. For t-statistic for each probe
•
Test statistic should be scale- and sample-size
independent
hyperG approach
•
Define gene universe (e.g. genes on array, genes
expressed)
•
Determine interesting genes (e.g. differential
expressed genes, genes located in specific genomic
region)
•
Test whether genes corresponding to interesting
genes are overrepresented/underrepresented in
each category
•
HyperG/Fisher exact Test
hyperG approach
Positions of members of G
in list:1,3,4,…,101,…
Do members of G have
significantly small ranks ?
hyperG approach
Define cutoff and count number of genes in G below
and above cutoff
Use Fisher’s exact test to find out if
distribution deviate from expectation
Conditional hyperG approach
”leave”
Takes into account tree
structure of GO by:
1. Start in leave (no children)
2. If leave/child significant
remove genes from parent
before testing parent
…..
Continue until root
root
Exercises hyperG approach
•
Defining gene universe: exercises 14.114.5
•
Determining interesting genes and
perform hyperG test: exercises 14.614.7
•
Conditional hyperG test: exercise 14.10