Download Smooth Response Surface - University of British Columbia

Document related concepts

Biology and consumer behaviour wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Pathogenomics wikipedia , lookup

Population genetics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Oncogenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epistasis wikipedia , lookup

Point mutation wikipedia , lookup

Human genetic variation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

NEDD9 wikipedia , lookup

The Selfish Gene wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome evolution wikipedia , lookup

Gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Patching the Puzzle of
Genetic Network
Grace S. Shieh
Institute of Statistical Science,
Academia Sinica
[email protected]
Outline
What is Genetic Network?
Why the area is one of the frontiers?
How Statistical modeling/computational
algorithms simplify the complex puzzle?
Applications
Dogma of biology

DNA -> mRNA -> Protein

Proteins: the elements that function in
organisms, e.g. yeast and human.
Somatic mutations affect key pathways in
Lung adenocarcinoma Nature, Oct.2008
Science, Sept, 2008
Complex human disease
l
Digenic effects may underlie:




Type II diabetes
Schizophrenia
Retinitis pigmentosa
Glaucoma
Tong et al., Science 2004
Complex human disease

These diseases may have similar synthetic
effect in the yeast genetic interaction map
Elements of genetic
network derived from
model organism, e.g.
yeast, are likely to be
conserved
The topology of the genetic network of neighborhood of SGS1 (Tong et al., 2004)
Experimental method to reveal
genetic interactions
Systematic Genetic Analysis with ordered
Arrays of Yeast Deletion Mutants
Tong et al., 2001, Science
 Global mapping of the Yeast Genetic
interaction network
Tong et al., 2004, Science

Genome landscape of a cell
Costanzo et al. 2010, Science
Costanzo et al., Science 2010

Synthetic sick or lethal (SSL) gene pairs:
when both genes are mutated, the organism
will die, but neither lethal

SSL is important for understanding how an
organism tolerates genetic mutations
Hartman, Garvik and Hartwell, 2001, Science
Scenarios resulting in synthetic interaction
Partially
redundant
genes
A
3 partially
redundant
pathways, 2
required
2 partially
redundant
pathways
A
E
A
E
J
B
F
B
F
K
C1 C2
C
G
C
G
L
D
D
H
D
H
M
B
E
< 2%
I
Protein
complex
tolerating 1
but not 2
destabilizing
mutations
A
B
D C
E F
I
SSL
< 4% *
A Pattern Recognition
Approach to Infer Gene
Networks
Grace S. Shieh
joined with
C.-L. Chuang, C.-H. Jen and C.-M. Chen
Bioinformatics 2008
Excerpted from
Tong et al. (2001) Science

Transcriptional Compensation (transcription reverse
compensation) interactions (Lesage et al. 2004;
Wong & Roth, 2005, Genetics; Kafri et al.,2005,
Nature Genetics):
among paralogues or SSL gene pairs, when one
gene is mutated, its partner gene’s expression
increases (decreases)
Goal: to predict TC and TRC interactions among SSL
gene pairs
Four sets of Yeast (Sachromyces cerevisiae)
micro-array gene expression data (Spellman,
et al, 1998) were used.
The red channel R: intensities of
synchronized yeast by alpha factor arrest,
arrest of a cdc 15 or cdc 28 mutant and
Elutration;
The Green channel G: average of nonsynchronized.
Cell cycles of CLN2 gene
qRT-PCR experiments
For a given pair of SSL genes,
Experimental group: gene A’s expression,
gene B been knocked out
Control group: gene A’s expression,
gene B wildtype
if A >> B => A& B may be TC
if A << B => A& B may be TRC
Gene expression of Transcription
Compensation (TC) pairs
Gene expression of Transcription
Reverse Compensation (TRC) pairs
The dependence of patterns and
their associated interactions

Assumption for PARE:
the dependence of CP (SP) and TC (TD)
interactions is significant. To test this
hypothesis: Fisher’s exact test
The Proportion of Complementary
Pattern (CP) in TC


Screen genes with significant changes over
time by maxt  Gi (t )  mint Gi (t )   1.5
resulted in 35 gene pairs
CP
SP
Total
TC
13
9
22
TD
2
11
13
Total
15
20
35
Fisher’s exact test: p-value < 0.02
significant at 95% level
PARE
The gene expression of the regulating gene is treated as
object contour, and the lagged-1 expression of the target
gene the boundary of interest in image segmentation
algorithm
2
2
def
def


G j t 

G
t


Gi  t   G j  t 
i
D2
D1
Ei , j 

Ei , j 

2
2


t

t

t
t
t
t


EiArea
,j
t   t  1,

def

1

gi  t    g j  t 
2 t


t    gi (t ), g j (t )  90o

Discrete Signals
Because gene expression is discrete signal, the 1st- and 2ndorder partial differential terms can be modified as follows:
Gi (t ) Gi (t  1)  Gi (t )

t
t
 2 Gi (t ) Gi (t  2)  2Gi (t  1)  Gi (t )

2
t
(t ) 2
the interaction S i , j can be determined as weighted sum of
the internal and external energies:
S i , j    Ei , j    Ei , j
D1
D2
 E
Area
i, j
PARE

In this study, each gene is represented by a node in a
graphical model, which is denoted by Gi , where i = 1, 2, …, N.
The edge Si , j represents the gene-gene interaction between
Gi and G j , where the enhancer gene Gi plays a key role in
activating or repressing the target gene G j .
Training set vs test set

Leave-one-out cross validation:
among n pairs, use n-1 pairs to train PARE, then predict
the left 1 pair, iteratively for n.

3-fold cross validation:
among all pairs, use 2/3 pairs to train, then predict the
left 1/3, from all combinations iterative this for N times
Experimental Results (TC/TRC)
alpha data set (18 time points) –
Table 1. The prediction results, checked against the qRTPCR experiments
Training
TPR
FPR
Test
TPR
Lagged Corr.
46%
EB-GGMs
52%
n-fold
76%
20%
73%
3-fold
78%*
18%*
71%*
Std
FPR
23%
PARE
3%
23%*
*Since 500 times 3-fold CVs were performed, only averages of TPRs are reported.
Experimental Results (TC/TRC)

For the alpha dataset, PARE yields

71-73% of true-positive rate

prediction accuracy 81%

FPR for predicting TC (TD) interaction was
bounded by 12% (10%) genome-wide.
Experimental Results (TC/TRC)
Checking against published literature

These genetic interactions are consistent with
the following experimental results:

Sgs1 and Srs2 are known redundant
pathways in replication (Ira et al., 1999; Lee et al., 1999)

Ex: Srs2 and Sgs1-Top3 suppress crossovers
during double stand break repair in yeast.

Sgs1/Top3/Rmi1 and Mus81/Mms4 complex
are involved in both double-strand break
repair and homologous recombination (Frabe et
al., 2002).

This indicates that Sgs1/Top3/Rmi1 and
Mus81/Mms4 are alternative pathways to
resolve recombination intermediates.
Inferring transcriptional
interactions

132 pairs of Activator-target gene (AT) and
Repressor-target (RT) gene interactions
were collected from published literatures
(MIPS, Mewes et al, 1999, Nucleic Acids Research;
Gancedo, 1998, Microbiology & Molecular Biology;
Draper et al., 1994, Molecular & Cellular Biology, etc)
Test for CP (SP) associatied
with RT (AT) pairs in the data
Chi-Squared test
Experimental Results (AT/RT)
Table 2. The prediction results using Elu data set,
checked against the 132 TIs from literatures.
Training
Test
TPR
FPR
TPR
Lagged Corr.
51%
EB-GGMs
59%
n-fold
79%
16%
77%
3-fold
81%*
16%*
74%*
Std
FPR
17%
PARE
3%
19%*
*the average of 500 times repeats
FPRs for genome-wide TIs predictions, and they are bounded by 21%.
Conclusions


The proposed PARE learns gene expression
patterns, then it can predict similar genetic
interactions using microarray data.
TPRs of PARE applied to the alpha (Elu)
dataset are about 73% (77%) for inferring
TC/TD interactions (TI), respectively.
Inferring genesis of obesity in
human (join w. Karine & Jean-Daniel
MGED from

Adipocytes


cells that primarily compose adipose tissue
specialized in storing energy as fat
Time-course MGED
C/EBP alpha (time-course)
2

Human adipocyte-derived cell lines
expression level (log )

2
1
0
-1
0
2
4
6
8
day
C/EBP alpha (MGED in ratio)
10
PARE to infer genesis of
obesity in human
Training stage:
MGED of human adipocytes-derived cell lines

70 known transcriptional interactions (TIs) from
iHOP
Prediction results:


40+ pairs of TIs and some genetic interactions
predicted
Some are consistent with existing experimental
results, some novel ones
Inferring TIs
Data preparation:
 Select significantly expressed genes:
 P-value < 0.01
 Significantly expressed in at least 1 time point (5 time
points in total)
->36 genes with a function of interest
Interact with 14 genes of interest (AP2, CCL2, CCL5,
LEP, etc…) -> 504 gene pairs
WebPARE: webcomputing service of
PARE (Chuang+, Wu+, Cheng and Shieh*,
2010, Bioinformatics)

To provide a simple web-interface for users
to infer GIs/TIs using time course gene
expression data and existing knowledge, e.g.
pre-stored validated TIs in yeast, mouse,
human, etc (TRANSFAC)
45
An example:
A list of genes involved in cell cycle and a data set
(e.g. Elu) were uploaded to WebPARE, TIs of these
pairs were of interest.
 Using integrated (pre-stored) pairs of TIs in yeast,
PARE correctly predicted 118 out of 176 TIs,
mTPR=67%
 e.g. The significant
predicted network
from 66 pairs ->

46
WebPARE html
www.stat.sinica.edu.tw/WebPARE
Demo

WebPARE can be assessed at:
http://www.stat.sinica.edu.tw/WebPARE
Acknowledgement
Dr. Ting-Fang Wang and Da-Yow Huang,
Inst. of Biological Chemistry, Academia Sinica
Drs. Karine Clement and J-D. Zucker, INSERM & IRD,
France
Cheng-Long Chuang, Chin-Yuan Guo, Chia-Chang Wang,
Dr. Shi-Fong Guo, Yu-Bin Wang, Jia-Hung Wu
Inst. of Statistical Science
Thank you for your attention!
Wanted (誠徵)


兼任 PhD students
Research assistants
to work at Shieh lab.(謝叔蓉老師實驗室)
統計所中研院
Parameter estimation
Next, we estimate parameters
via the particle swarm optimization (PSO)
algorithm (Kennedy and Eberhart, 1995)
is a stochastic optimization technique that
simulate the behavior of a flock of birds.
Example (finding largest gradient)
Evolutionary Process of PSO
Gene expression of ActivatorTarget (AT) gene pairs
Gene expression of RepressorTarget (RT) gene pairs