Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
A Statistical Framework for Integrating Different Microarray Data Sets in Differential Expression Analysis Yinglei Lai, Ph.D. Department of Statistics & Biostatistics Center The George Washington University The 5th Annual Rocky Mountain Bioinformatics Meeting Supported by NIH/NIDDK DK-75004 Data Sets Huntington’s disease data sets Borovecki et al. (2005) PNAS 14 healthy and 17 HD Affymetrix U133A GeneChip (22,283 genes) Amersham CodeLink Uniset Human I and II bioarrays (20,289 genes) 8597 common genes after thresholding and filtering Can we integrate two data sets for the same study to achieve an improved detection of differential expression? A Framework Univariate test for detecting differential expression Mixture model based tests for concordance/discordance Student’s t-test Obtain a pair of (one-sided) p-values for each gene Transform (inverse normal c.d.f.) p-values into z-scores A three-component normal-mixture model (Lai et al., 2007) H0: complete concordance vs. H1: partial concordance/discordance H0: complete discordance vs. H1: partial concordance/discordance Mixture model based integration of a pair of z-scores If complete discordance, then data integration discouraged If partial/complete concordance, then data integration considered Complete/Partial Concordance/Discordance CC: 0.8[N(0,1), N(0,1)] + 0.1[N(-2,1), N(-2,1)] + 0.1[N(2,1), N(2,1)] PCD: 0.8[N(0,1), N(0,1)] + 0.05[N(-2,1), N(-2,1)] + 0.05[N(2,1), N(2,1)] + 0.05[N(-2,1), N(2,1)] + 0.05[N(2,1), N(-2,1)] CD: [0.8N(0,1) + 0.1N(-2,1) + 0.1N(2,1), 0.8N(0,1) + 0.1N(-2,1) + 0.1N(2,1)] An Integrative Score for Prioritizing Genes P( concordant differential expression | observed pair of z-scores and fitted model ) = [ Pm( observed pair of z-scores both up-regulated ) + Pm( observed pair of z-scores both down-regulated ) ] / Pm( observed pair of z-scores ) Results (HD data sets) Both complete concordance (CC) and complete discordance (CD) rejected at p-value < 0.001