Download Lai

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A Statistical Framework for
Integrating Different Microarray
Data Sets in Differential
Expression Analysis
Yinglei Lai, Ph.D.
Department of Statistics & Biostatistics Center
The George Washington University
The 5th Annual Rocky Mountain Bioinformatics Meeting
Supported by NIH/NIDDK DK-75004
Data Sets

Huntington’s disease data sets

Borovecki et al. (2005) PNAS




14 healthy and 17 HD
Affymetrix U133A GeneChip (22,283 genes)
Amersham CodeLink Uniset Human I and II bioarrays (20,289 genes)
8597 common genes after thresholding and filtering
Can we integrate two data sets for the same study to achieve
an improved detection of differential expression?
A Framework

Univariate test for detecting differential expression




Mixture model based tests for concordance/discordance




Student’s t-test
Obtain a pair of (one-sided) p-values for each gene
Transform (inverse normal c.d.f.) p-values into z-scores
A three-component normal-mixture model (Lai et al., 2007)
H0: complete concordance vs. H1: partial concordance/discordance
H0: complete discordance vs. H1: partial concordance/discordance
Mixture model based integration of a pair of z-scores


If complete discordance, then data integration discouraged
If partial/complete concordance, then data integration considered
Complete/Partial Concordance/Discordance


CC: 0.8[N(0,1), N(0,1)] + 0.1[N(-2,1), N(-2,1)] + 0.1[N(2,1), N(2,1)]
PCD: 0.8[N(0,1), N(0,1)] + 0.05[N(-2,1), N(-2,1)] + 0.05[N(2,1), N(2,1)] +
0.05[N(-2,1), N(2,1)] + 0.05[N(2,1), N(-2,1)]

CD: [0.8N(0,1) + 0.1N(-2,1) + 0.1N(2,1), 0.8N(0,1) + 0.1N(-2,1) + 0.1N(2,1)]
An Integrative Score for Prioritizing Genes

P( concordant differential expression |
observed pair of z-scores and fitted model )
= [ Pm( observed pair of z-scores both up-regulated )
+ Pm( observed pair of z-scores both down-regulated ) ]
/ Pm( observed pair of z-scores )
Results (HD data sets)

Both complete concordance (CC) and complete discordance (CD) rejected
at p-value < 0.001
Related documents