* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene Set Enrichment Analysis presentation
Metagenomics wikipedia , lookup
Transposable element wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Oncogenomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Long non-coding RNA wikipedia , lookup
X-inactivation wikipedia , lookup
Essential gene wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene desert wikipedia , lookup
Minimal genome wikipedia , lookup
The Selfish Gene wikipedia , lookup
Ridge (biology) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene Set Enrichment Analysis (Chapters 13-14) Motivation • Differential expression of individual probes focuses on genes with extreme changes – adjusting for multiple testing is often conservative and does not change the rank of the DE genes – introduces an artificial distinction – differentially expressed or not – genes with small to moderate changes will not be detected Motivation • Gene set enrichment analyses focuses on sets of genes whose expression levels are coordinately changed • Gene sets defined a priori (KEGG,GO,MSigDB) • Original idea applied to expression data (Mootha et al. 2003, Nat Genet 34, 267-273) • Help biological interpretation • Greater power to detect smaller changes Gene Set Enrichment Analysis 1. Identify a priori biologically interesting gene sets 2. Pre-process and quality assess expression data as usual 3. Non-specific filtering – remove probes that cannot possibly be interesting 4. Combine and assess “signals” from several probes – – GSEA approach HyperG approach A priori gene sets • KEGG, Gene Ontology, MsigDb, protein interaction partners, chromosome bands… • There are many possible gene sets – biologically motivated A priori gene sets • A gene set is merely a grouping of genes • Groups do not need to be exhaustive or disjoint • Grouping do not need to be completely right • GSEA and hyperG approaches rely on averaging to help adjust for mistakes Non-specific filtering • Exclude genes that cannot be interesting – – – • Annotation is linked to entrez gene IDs – – • limited variation across all samples no annotation pathways with only few members several probe IDs linked to same Entrez gene ID need some kind of filtering (random, most significant) Must not use criteria to be used in analysis, e.g., must not filter on expression in biological pathway of interest Alternative Statistical Approaches HyperG approach: Is a functional gene set (eg. a GO term, KEGG pathway etc.) overrepresented among the DE genes? GSEA approach: Is a functional gene set “differentially expressed”? GSEA approach GSEA approach • Compute a test statistic for each probe • Idea is to calculate a statistic that meaningfully contrast expression levels between groups e.g. For t-statistic for each probe • Test statistic should be scale- and sample-size independent GSEA approach • Calculate appropriate summary statistic of the test statistic in each gene set • Assess the significance of the summary statistic GSEA approach based on t-stat • t-statistics follows tdistribution • Sum of t-statistic is approximately Normal • Sum standardized by square root of the number of genes |K| in the set is approximately Normal with mean 0 and variance 1. GSEA approach based on t-stat • Strong assumptions, eg. about independence of t-statistics and normality of zK • Assess significance of summary statistic using permutation – – permute sample labels permute gene labels Array1 Array2 Array3 Array4 Array5 … Gene1 0.46 0.30 0.55 3.4 2.1 … Gene2 -1.3 0.01 1.2 2.3 0.4 … Gene3 0.23 -0.88 2.3 3.3 0.7 … Gene4 1.87 0.66 -3.4 -1.4 0.88 … Gene5 … … … … GSEA - Overlapping Gene Sets Fig 13.3 • Number of genes in common/ number of genes in smaller gene set • Test effect of overlapping gene sets using linear models Exercises GSEA approach • Gene Set Enrichment Analysis using KEGG: exercises 13.1-13.5 • Determining significance of summary test statistics using permutations: exercises 13.6-13.7 • Identifying and assessing the effects of overlapping gene sets: exercises 13.1313.14 hyperG approach hyperG approach • Compute a test statistic for each probe • Idea is to calculate a statistic that meaningfully contrast expression levels between groups e.g. For t-statistic for each probe • Test statistic should be scale- and sample-size independent hyperG approach • Define gene universe (e.g. genes on array, genes expressed) • Determine interesting genes (e.g. differential expressed genes, genes located in specific genomic region) • Test whether genes corresponding to interesting genes are overrepresented/underrepresented in each category • HyperG/Fisher exact Test hyperG approach Positions of members of G in list:1,3,4,…,101,… Do members of G have significantly small ranks ? hyperG approach Define cutoff and count number of genes in G below and above cutoff Use Fisher’s exact test to find out if distribution deviate from expectation Conditional hyperG approach ”leave” Takes into account tree structure of GO by: 1. Start in leave (no children) 2. If leave/child significant remove genes from parent before testing parent ….. Continue until root root Exercises hyperG approach • Defining gene universe: exercises 14.114.5 • Determining interesting genes and perform hyperG test: exercises 14.614.7 • Conditional hyperG test: exercise 14.10