* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Working with enriched gene sets in R
Genomic library wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Copy-number variation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
X-inactivation wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Pathogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Essential gene wikipedia , lookup
Gene desert wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome (book) wikipedia , lookup
Working with enriched gene sets in R Peter Svensson Micheline Giphart-Gassler Harry Vrieling P-values of genes • Starting with a vector of p-values from – t.test(irradiated, control) – wilcoxon(irradiated, control) – lm(formula, data) Distribution of p-values • two-tailed Distribution of p-values • one-tailed Distribution of p-values • Proportion of unchanged genes, π0 library(qvalue) • (Storey&Tibshirani 2001) qvalue(pvals)$pi0 Annotation • Anntotation of the genes available from Bioconductor – MetaData for commercial arrays – AnnBuilder for homemade – Unigene name, code, symbol, entrez gene, GO terms, KEGG pathways, Pubmed ids... Gene Set Enrichment Analysis • Mootha et al, Nat Genet. 2003, 34:267 • Use the gene sets that are made by GO terms, KEGG terms, name containing ’kinase’, genes that cluster together • Make a vector of – all not in group -sqrt(G/(N-G)) – all in group sqrt(N-G/G) Running sum • The sum of the values in vector will be 0 • Plot the running sum: • The peak is at a point at p=0.1 GSEA • The enrichment score can be used to determine the importance of gene set. • Permutation technique to get significance. Hypergeometric probability • Used in dChip and DAVID. • Input is – # genes in the gene set (n), # genes on array (n+m) – # selected genes in the gene set (x), # selected genes (N) • dhyper() gives the density Selecting genes • Have to set a threshold, p0, for the p-values. p < p0 selected • p0 = 0.001 is not informative • p0 = 0.1 • at the maximum of the peak • dissect(pvals) – (BMC Bioinformatics, to appear) • Will get a p-value • Tested 4000 GO terms, need for correction for multiple testing p.adjust(pvals,”fdr”) • Look at significant terms, p<0.001 Cisplatin data • Mouse embryonic stem cells exposed to various doses (low, medium and high). Harvested at 0<t<24 • Low doses, early time points – Few genes changed – Few pathways changed • Indications of what will come Preprocessing • For internal use at www.medgencentre.nl/pla • Not updated • Code for working with widgets, definining MIAME-compliant object, AffyBatch (exprSet), doing tests, building linear models, correlation tests, GSEA • Updating together with Agata Meglicz. It will be improved soon. Demonstration cdf=“hgu133a” source(“gsea.R”) gsea() dissectGUI()