* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A global test for groups of genes
Saethre–Chotzen syndrome wikipedia , lookup
Genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Essential gene wikipedia , lookup
Quantitative trait locus wikipedia , lookup
X-inactivation wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Cancer epigenetics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Minimal genome wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Oncogenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Microevolution wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Group testing: global tests Ulrich Mansmann Department of Medical Biometrics and Informatics University of Heidelberg Overview • Gene set enrichment Lamb J et al. (2003) A mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer, Cell, 114: 323-334 • Global test: Goeman JJ. Et al. (2003) A global test for groups of genes: Testing association with a clinical outcome, Bioinformatics, 20:93-99 Bioconductor package: globaltest • Example: Differential gene expression between UICC stages II / III colon cancer patients (Groene / Mansmann). Group testing 2 Two questions about group of genes Question 1: Two groups of genes have to be compared with respect to gene expression: Is the gene expression in gene group A different from the expression in gene group B. Genes of group A Genes of group B Question 2: Is there differential gene expression between different biological entities not in terms of single genes but with respect to a defined group of genes. Entity I Entity II Well defined group of genes Group testing 3 Example: Colon Cancer Study: 18 patients with UICC II colon cancer, 18 patients with UICC III colon cancer, HG-U133A, 22.283 probesets representing ~18.000 genes. Snap-frozen material, laser microdisection. Question 1: Is the differential gene expression between UICC II /III patients more distinct for genes in cancer related pathways compared to genes in other pathways? Question 2: Is there differential gene expression in the p53 signalling pathway? Group testing 4 Gene set enrichment Problem: Two groups of genes have to be compared with respect to gene expression: Is the gene expression in gene group A different from the expression in gene group B. Basic idea: nA genes in group A, nB genes in group B Order the genes with respect to the expression value. If there is a difference between both groups, the expression values will be separated. The position of a value in group A will have the tendency to be high or low. In case of no difference, the values will be nicely mixed. Group testing 5 Gene set enrichment Basic idea: • nA genes in group A, nB genes in group B. • Order the genes with respect to expression values. • Create a vector vv of (nA+nB) components with value – nB at each position where a value from group A is sitting and with value nA at each position where a value from group B is sitting. • Calculate yy = cumsum(vv). • Draw a line starting at (0,0) through points (i, yy[i]). The line will end in (nA+nB,0) because (-nB)nA+ nAnB =0. • Look at Mvv = max{|min(yy)|,max(yy)} which will be large in case of a good separation between both groups. • Permute the vector vv to get vv*, calculate yy* and Mvv*. Use permutation to calculate the distribution of Mvv under the Null hypothesis, determine the permutation based p-value: pperm = #{Mvv* Mvv}/ # permutations. Group testing 6 Gene set enrichment Simple Example • Gene expression in group A: {2, 3}, nA = 2 Gene expression in group B: {1, 4, 6}, nB = 3 3 4 • Order the genes with respect to expression values. {1, 2, 3, 4, 6} 0 • yy = {-2, 1, 4, 2, 0} 1 Score 2 • vv = {-2, 3, 3, -2, -2} • Distribution of Mvv under the Null hypothesis 2 ~ 0.1; 3 ~ 0.3, 4 ~ 0.4, 6 ~ 0.2 10000 permutations -2 -1 • Mvv = 4 0 1 2 3 4 5 Rank • pperm = #{Mvv* Mvv}/ # permutations = 0.4+0.2 = 0.6 Group testing 7 Gene set enrichment – Colon cancer 1407 probe sets are studied which belong to 9 cancer specific pathways. androgen_receptor_signalling apoptosis cell_cycle_control notch_delta_signalling p53_signalling ras_signalling tgf_beta_signalling tight_junction_signalling wnt_signalling 122 245 51 50 45 316 100 425 214 Group testing 8 Gene set enrichment – Colon cancer group.A group.B Myy androgen_receptor_signaling 118 1289 6983 0.0568 Apoptosis 238 1169 17801 0.7438 cell_cycle_control 51 1356 10413 0.3616 notch_delta_signalling 50 1357 9010 0.6492 p53_signalling 45 1362 12390 0.0924 ras_signalling 311 1096 15486 0.6252 tgf_beta_signaling 100 1307 22615 0.0128 tight_junction_signaling 406 1001 15456 0.4414 wnt_signaling 214 1193 16318 0.8432 Group testing p.value 9 Goeman’s Global Test • Test if global expression pattern of a group of genes is significantly related to some outcome of interest (groups, continuous phenotype). • If this relationship exists, then the knowledge of gene expression helps to improve the prediction of the phenotype of interest. If the prediction can not improved by knowing the gene expression then there will not be differential gene expression. • Test statistic: Q ~ (Y-µ)’R (Y-µ) ~ [Xi’(Y-µ)]² sum over genes of the pathway ~ Rij(Yi-µ) (Yj-µ) sum over subjects µ: Mean of phenotype, Xmi Expression for gene m in subject i R : X’X: IxI matrix of correlation between gene expression of subjects Group testing 10 Goeman’s Global Test - Example • Test for differential gene expression in p53 signalling pathway 45 probesets • Global Test result: 45 out of 45 genes used; 36 samples p value = 0.0114 based on 10000 permutations Test statistic Q = 11.78 with expectation EQ = 5.466 and standard deviation sdQ = 2.152 under the null hypothesis • Informative plots: Sample plot: how good fits a sample to its phenotype Checkerboard: Correlation between samples Gene plot: Influence of single genes to test statistics Group testing 11 35 Goeman’s Global Test - Example 200 150 influence 20 100 15 50 10 0 5 sorted samplenr 25 250 30 300 pos. coregulated with Y neg. coregulated with Y 5 10 15 20 25 30 35 sorted samplenr 0 10 20 30 40 genenr Group testing 12 Co1.T.IT.30.F.CEL Group testing Co8.T.IT.549.F.CEL Co668.T.IT.1791.F.CEL Co666.T.IT.1777.F.CEL Co665.T.IT.1820.F.CEL Co657.T.IT.1744.F.CEL Co656.T.IT.1740.F.CEL Co653.T.IT.1731.F.CEL Co652.T.IT.1762.F.CEL Co620.T.IT.1570.F.CEL Co611.T.IT.1482.F.CEL Co610.T.IT.1206.F.CEL Co54.T.IT.1480.F.CEL Co5.T.IT.62.F.CEL Co44.T.IT.1162.F.CEL Co23.T.IT.1052.F.CEL Co17.T.IT.563.F.CEL Co13.T.IT.86.F.CEL Co10.T.IT.83.F.CEL Co9.T.IT.1728.F.CEL Co675.T.IT.1834.F.CEL Co672.T.IT.1841.F.CEL Co670.T.IT.1821.F.CEL Co659.T.IT.1742.F.CEL Co619.T.IT.1515.F.CEL Co618.T.IT.1449.F.CEL Co612.T.IT.1423.F.CEL Co581.T.IT.1085.F.CEL Co45.T.IT.688.F.CEL Co41.T.IT.680.F.CEL Co40.T.IT.676.F.CEL Co39.T.IT.673.F.CEL Co37.T.IT.669.F.CEL Co20.T.IT.570.F.CEL Co15.T.IT.558.F.CEL Co14.T.IT.88.F.CEL -50 0 50 influence 100 Goeman’s Global Test - Example 1 samples 0 samples 13