Download TW-SRM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Overview of My Research
Jian Huang
• Semiparametric Models and Survival Analysis
(Jong-Sung Kim)
• Nonparametric MLE
• Statistical Genetics (Kai Wang, Yanming Jiang,
Susan Slager, Elizabeth Ludington, Xinqun Yang)
[Veronica Vieland, PPHG & CSGR]
• Microarray Analysis (Deli Wang, Ning Yan, Kwang-youn
Kim) [Soares’ Lab, Casavant CBCB, Sheffield’s Lab, Stone’s Lab]
[Cun-Hui Zhang]
1
Statistical Genetics
Main Goal: find chromosomal regions harboring genes
that predispose diseases or affect traits of
interest
2
Genetic Linkage Analysis of a Dichotomous
Trait Incorporating a Quantitative trait
If a quantitative trait is linked to the same
chromosomal regions as the disease, then
joint analysis of disease status and the
quantitative trait should in general increase
the power to detect linkage.
Huang J and Jiang Y (2003): American Journal of Human Genetics, 72: 949-960.
Example
Asthma:
Associated quantitative trait: total serum IgE level
[Sandford et al. 1993, Wjst et al. 1999].
QTL analysis of total IgE level
[Marsh et al. 1994, Meyers et al. 1994,Daniels et al. 1996, Laitinen et al. 1997,
Palmer et al. 1998 ......]
Autism:
Possibly associated quantitative scorebased on:
Spoken language, social empathy, compulsions, imitation,
milestone, head circumference, etc. [Piven 2001]
4
Example: Asthma
German asthma genome scan data
[Wjst et al. 1999, Genetic Analysis Workshop 12]
97 families with 415 individuals:
91 families with affected sib-pairs (ASPs)
6 families with affected sib-trios
All affected children: Total serum IgE level
331 markers on 22 autosomal chromosomes
(about 10cM apart) are typed for each individual.
5
Likelihood
Data:
Pedigree structure
Dichotomous trait:
Quantitative trait:
Marker:
Likelihood:
T
Y
M
P(Y, M,T| ascertainment)
If ascertainment is based on the trait T:
P(Y, M|T)
6
Likelihood
Putative locus:
t
x
m1
m2
t
m 3 m4
m5
7
Identity by Descent (IBD)
12
34
A
B
13
14
23
24
13
14
23
24
24
23
14
13
13
14
23
24
IBD=0
IBD=2
13
14
23
24
13
2
1
1
0
14
1
2
0
1
23
1
0
2
1
24
0
1
1
2
B
A
13
13
14
14
23
23
24
24
14
23
13
24
13
24
14
23
IBD=1
8
Likelihood: Formulation
• Families in a linkage study are usually collected based on the
phenotypes of the individuals
• Likelihood should be based on the distribution conditional on
the phenotype on which the ascertainment is based
• Pleiotropy or tight coincident linkage
2
p( y , m | asp; t )   p( y , m , s( t )  j | asp )
j 0
2
  p( y | s( t )  j , asp ) P ( m | s( t )  j ) P ( s( t )  j | asp )
j 0
9
Likelihood Ratio Statistic
2
L( ,  , Fn )   p( y | s( t )  j , asp )w j j
j 0
 sup ,  L( ,  , Fn ) 
  2 log 

 sup  L( 0 ,  , Fn ) 
10
Likelihood: Asymptotic Distribution



The asymptotic null distribution of the LR statistic is
nonstandard:  1 disappears under H 0
The asymptotic null distribution of the LR statistic
is unknown
Conservative null distribution:
Set  1  0.5
0.25 02  0.5 12  0.25 22
11
Simulation: Null Distribution
n = 100 ASPs
# of replications = 100,000
 1  0.5

0.25 02  0.5 12  0.25 22
Simulated
0.050
4.28
3.39
0.010
7.33
6.24
0.001
11.74
10.68
12
13
14
Microarray Analysis






Normalization
Identifying differentially expressed genes
Finding groups of co-regulated genes
Finding molecular finger prints of various
types of cancer
Understanding how genes regulate
development
Inferring gene networks
15
Microarray Schematic
Duggan, et. al. Nature Genetics (1999) 21:10-14.
Blocks
4.5 mm
1
5
9
13
2
6
10
14
3
7
11
15
4
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
8
12
16
17
18
19
20
21
22
23
24
Printing configuration: 4 x 4 pins
25
26
27
28
29
30
31
32
(1-16 and 17-32)
Block 1 and 17,
2 and 18,
3 and 19, … are printed by the same pin
Courtesy of Liliana Menzella of Soares’ Lab
17
Data File Example (Part of Slide AAE248)
---- Red (Cy5) channel
Block ID
F635 Median F635 Mean F635 SD B635 Median B635 Mean B635 SD F Pixels B Pixels Flags
1 UI-M-BZ1-bfw-g-13-0-UI.s1-D
668
1021
1224
140
163
181
156
1246
0
1 UI-M-BZ1-bfw-f-20-0-UI.s1-D
1927
2351
1562
146
172
175
460
2589
0
1 UI-M-BZ1-bfv-a-21-0-UI.s1-D
1316
2115
1959
156
173
131
316
2259
0
1 UI-M-BZ1-bfu-o-21-0-UI.s1-D
2422
2856
1607
148
163
99
316
2266
0
1 UI-M-BZ1-bfu-m-10-0-UI.s1-D
1074
1409
878
153
190
238
392
2342
0
1 UI-M-BZ1-bfu-l-13-0-UI.s1-D
1204
1608
1226
156
192
250
460
2452
0
1 UI-M-BZ1-bdw-g-02-0-UI.s1-D
7433
7059
2389
154
174
148
392
2325
0
1 UI-M-BZ1-bdw-a-04-0-UI.s1-D
356
380
137
163
171
84
80
634
0
1 UI-M-BZ1-bds-e-04-0-UI.s1-D
2407
2342
716
149
165
110
256
2082
0
1 UI-M-BZ1-bdr-f-06-0-UI.s1-D
9137
9183
1241
154
168
130
316
2376
0
1 UI-M-BZ1-bdr-b-08-0-UI.s1-D
4246
4231
860
153
169
115
316
2312
0
18
Expression Data
Background
subtracted intensities:
Red Channel (Cy5): R
Green Channel (Cy3): G
Log
Intensity Ratio
log2(R/G) = 0 Constant expression
> 0 R up-regulated
< 0 R down-regulated
Total Intensity 0.5*log2(R*G) =0.5*[log2(R) + log2(G)]
19
Expression Data Matrices: I---II
Log intensity ratio
Gene ID
1
2
3
4
5
1
0.374
0.298
-2.85
-0.01
-0.34
2
1.471
-3.24
0.09
-1.34
1.636
3
-0.03
-0.23
-0.34
-0.19
-0.91
4
0.012
-0.48
-0.7
-0.08
-0.62
5
-0.23
-0.13
-0.06
0.475
-0.09
4 4
4.1868
10.716
4.5285
9.8173
11.048
5 5
11.379
10.548
8.3241
15.024
10.044
Log intensity product
Gene ID
1
2
3
4
5
1 1
6.4586
8.3808
9.2009
10.271
9.9864
2
2
7.927
5.6388
10.769
8.8253
14.156
3 3
10.679
4.341
10.524
9.8998
11.86
20
Normalization
21
Comparison of normalization curves (Data from Callow et al.
2000)
Green: TW-SRM normalization Red: loess normalization
22
A Two-way Semiparametric Regression Model
(TW-SRM)
Observed intensity = normalization curve (bias) + signal + random error
The TW-SRM
The SRM
y : log intensity ratio
x : log total intensity
yi : outcome variable
z : indicator for a slide
zi : covariate of interest
 : normalizat ion curve
xi : confoundin g covariate
y   ( x )  zt   
ij
i ij
i j ij
i :1,..., n (# of slides)
y i   ( xi )  zi    i ,
T
i  1,, n.
j :1, ..., J (# of genes)
23
Results
Loess and T-test
TW-SRM
24
Results
Loess and T-test
pvalue
ID
0
2149
0
4139
0
5356
0
540
0
1739
0
2537
0
1496
0
4941
947 1.00E-04
5759 2.00E-04
0.0013
4631
0.0017
4160
0.0018
5604
0.0019
2324
t-stat
TW-SRM
t-nume t-deno
21.503 3.0806 0.1433
13.633 1.0251 0.0752
11.605 1.7957 0.1547
11.891 2.9852 0.2511
9.6767 0.8511 0.0879
10.01 0.9371 0.0936
8.42 0.9195 0.1092
7.0476 0.9241 0.1311
5.6995 0.6287 0.1103
5.0944 0.2196 0.0431
-4.17
-0.229
0.055
3.9402 0.2488 0.0631
3.9521 0.3661 0.0926
3.9362 0.3079 0.0782
ID
540
2149
5356
4941
4139
1496
541
2537
1739
1337
563
3809
5986
4220
pvalue z-score z-nume z-deno
0 18.232 3.2283 0.1771
0 18.295 3.3294
0.182
0 11.548 2.1336 0.1848
0 6.4465
0
1.213 0.1882
6.353 1.2481 0.1965
0 6.3059 1.1445 0.1815
0 5.4741 0.9822 0.1794
0 5.4533 0.9721 0.1783
0 5.3161 0.9599 0.1806
0 4.8553 0.9054 0.1865
0
4.595 0.8523 0.1855
0
-4.369
-0.764 0.1749
0
-4.246
-0.814 0.1918
0
-4.118
-0.775 0.1881
25
Computation
i B - spline representa tion
i ( x)  i  k 1 ik bk ( x)
K
where b1......bK are B - spline bases.
Find (  ,  ) and  to minimize

T
w

y


(
x
)

z
 ij ij i ij ij  j
i

j
26
Problem: An Infinitely Semiparametric Model
Parameters:
(1 n )
Asymptotic analysis?
and
( 1   J )
J 
n
n  0 [e.g. n  O( J 1 / 4 )]
J
27
Problem: An Infinitely Semiparametric
Model
(1 n )
( 1   J )
n: # of parameters n: sample size
J: sample size
J: # of parameters
28
29
30
Related documents