Download reporter genes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein–protein interaction wikipedia , lookup

Transcript
How will we efficiently understand
the interactions of ~20,000 genes,
with ~200 million potential pairwise
interactions?
Minimally, we need to use the
information that exists
June 1979: 2 relevant papers
S. Brenner (Genetics 1974)
The genetics of Caenorhabditis elegans
J. Sulston & R. Horvitz (Developmental Biology 1977)
Post-embryonic cell lineages of the nematode, Caenorhabditis elegans
Jan 2008: >200,000 relevant papers
Prioritizing high resolution genetic
interaction tests by knowledge mining
1 Full text information retrieval
Hans-Michael Muller, Arun Rangarajan, Tracy Teal,
Kimberly Van Auken, Juancarlos Chan
QuickTi me™ and a
T IFF (Uncom pressed) decom pressor
are needed to see t his pict ure.
QuickT ime ™an d a
TIFF ( Uncomp res sed) deco mpre ssor
ar e need ed to see this pictur e.
2 Predicting Gene Interactions from
information available in public databases
Weiwei Zhong
Textpresso Literature Search Engine
www.textpresso.org
Scientists spend more time skimming for information than
reading papers.
Much information are details hidden in the full text, and
are neither in the abstract nor captured in MeSH terms.
We designed Textpresso to do automated skimming
for researchers and database curators.
The output can be used for more sophisticated
Language Processing.
Natural
Can we do better than PubMed and Google Scholar?
Full Text
Sentence
PubMed
(-)
-
Google Scholar
+
-
Textpresso
+
+
Ontology
MeSH
Taxonomy
Gene Ontology
Customized
Neuroscience Information
Framework
Categories are “bags of words”
FOXO
HOXA1
GENE pax2
PKD1
PATHWAY
precursor
upstream
cascade
descendants
denticle
Reporter Genes
GFP, EGFP, YFP, lacZ, CFP,
Green Fluorescent Protein,
reporter gene, dsRed, mCherry
wing
Drosophila
anatomy
MP2 neuron
Individual sentences in full text are
marked up with Categories
TEXTPRESSO CATEGORIES
regulation
gene
process
gene
life stage
anatomy
egl-38 regulates lin-3 transcription in vulF in L3 larvae
ARTICLE TEXT
Automatically mark up the whole corpus of
papers with terms of categories,
and index for rapid searching
What Arabidopsis genes are expressed in the
meristem based on reporter genes?
www.textpresso.org/arabidopsis
14,930 A.t. papers
Is a nicotinic receptor associated with
Drugs of Abuse other than nicotine?
www.textpresso.org/neuroscience
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
15,786 papers
The problem with clever fly names
Gene name
forager
ascute
wee
Washed eye
abbreviation
for
as
we
We
use italics from PDF
~70%
Train system to recognize
gene names by context
~85%
Michael Müller, Arun Rangarajan
What reporter genes have been used with
Drosophila genes to study human disease?
www.textpresso.org/fly
20,099 full-text fly papers
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Database curation: e.g. Gene-Gene Interactions
Find all sentences that contain ≥2 gene names and
≥1 association or regulation word:
26,000 sentences out of 4.400 articles
simple interface to “check off” sentences
100 sentences per hour
output into database
Prioritizing high resolution genetic
interaction tests by knowledge mining
1 Full text information retrieval
Hans-Michael Muller, Arun Rangarajan, Tracy Teal,
Kimberly Van Auken, Juancarlos Chan
QuickTi me™ and a
T IFF (Uncom pressed) decom pressor
are needed to see t his pict ure.
QuickT ime ™an d a
TIFF ( Uncomp res sed) deco mpre ssor
ar e need ed to see this pictur e.
2 Predicting Gene Interactions from
information available in public databases
Weiwei Zhong
Training Set
Training set
 4775 Positive Interactions
 Genetic, Literature curation (1909)
 Yeast two-hybrid screen (2933)
 3296 Negative Genetic Interactions
 cis doubles in genetic mapping
Benchmark
 5515 Positives: KEGG database
 5000 Negatives: Randomly selected
Algorithm
fly orthologs
interaction
GO
expression
phenotype
microarray
fly score
worm gene pair
GO
expression
phenotype
microarray
worm score
yeast orthologs
interaction
GO
localization
phenotype
microarray
yeast score
Ortholog
mapping
Scoring
Score
integration
total
score
Scoring and score integration
likelihood ratio
p(v | pos)
L
p(v | neg )
p(v | pos): probabilities of the predictor having value v if two
genes interact
p(v | neg): probabilities of the predictor having value v if
two genes do not interact
C. elegans expression
sum the logs of the L’s
7
6
n
score   ln Li
5
L
4
i1
3
n: number of predictors
Li: likelihood ratio of each predictor
2
1
0
0
5
10
15

20
25
term usage
(% of annotated genes associated with the term)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
lin-3
let-23
sem-5
sos-1
gap-1
let-60
lin-45
ksr-1
mek-2
lip-1
mpk-1
v1.4 & v1.6
v1.6
Testing let-60 ras Interactors
87 genes have score >0.9; 17 confirmed from literature
Inactivating genes on a gain-of-function (gf) let-60 mutant by RNAi
Assay vulva precursor cell (VPC) induction
N2
not Multivulva
let-60(gf)
strong Multivulva
let-60(gf);
tax-6(RNAi)
weak Multivulva
WT%
Muv%
average
100
0
3.0
let-60(gf)
0
100
4.3
let-60(gf); tax-6(RNAi)
40
60
3.4
N2
control
tax-6
csn-5
qua-1
C01G8.9
pfn-3
nhr-41
C05D10.3
Y48G10A.3
dlg-1
tag-22
grd-11
W03F11.6
mig-15
taf-6.1
taf-1
lin-32
unc-55
Y59A8B.23
Y48G10A.3
wrt-8
sqv-7
wrt-4
evl-20
C07H6.3
glp-1
unc-59
grd-1
wrt-7
hog-1
cdc-25.3
che-1
mom-5
Y53C12C.1
rnt-1
cki-1
let-413
taf-4
tig-2
tag-117
psa-4
T24H10.7
lin-48
src-2
B0353.1
R05G6.10
T18D3.7
grd-2
ZC84.3
cdc-42
cki-2
F59A2.4
K10H10.1
C04C3.3
F34D6.4
F34D10.2
C25H3.4
H27A23.1
Y54G11A.1
B0035.16
M03C11.4
C41C4.8
M01F1.5
ZK945.8
ZK643.2
F26E4.12
C16A3.7
C53A3.2
H14N18.4
W02D3.6
F08A8.4
C37H5.3
F28H6.3
R10E11.3
R04B5.5
B0491.1
C06A8.6
VPC induction index
let-60(gf) VPC Induction
Under Various RNAi
6
5
Score > 0.9
p< 0.01
Score < 0.6
p< 0.05
4
3
2
1
0
12 hits (p<0.05) in 49 genes;
1 hit in 26 randomly selected genes
Combined with literature, 29/66 (44%) predictions confirmed
let-60 ras interactors (suppressors)
tax-6
calcineurin
csn-5
COP-9 signalosome
qua-1
hedgehog-related protein
C01G8.9
SWI/SNF-related (eyelid)
C05D10.3
ABC transporter (white)
pfa-3
profilin
nhr-4
transcription factor
C. elegans Interactions
Input 4,726 known interactions among 2,713 genes
Predict additional 18,863 for total of 23,589 interactions
among 4,408 genes
QuickTime™ and a
decompressor
are needed to see this picture.
for Drosophila
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
D. melanogaster interactions
Input 4,180 known interactions among 1,262 genes,
Predict 13,126 for 17,306 interactions among 6,044 genes
QuickTime™ and a
decompressor
are needed to see this picture.
Automated, Quantitative Phenotyping
locomotion
morphology
generative graphics
plate demographics (Weiwei Zhong)
sexual behavior
Chris Cronin: movement analysis
BMC-Genetics 2005
E. Fontaine, A. Whittaker, Joel Burdick
Prioritizing high resolution genetic
interaction tests by knowledge mining
1 Full text information retrieval
Hans-Michael Muller, Arun Rangarajan, Tracy Teal,
Kimberly Van Auken, Juancarlos Chan
QuickTi me™ and a
T IFF (Uncom pressed) decom pressor
are needed to see t his pict ure.
QuickT ime ™an d a
TIFF ( Uncomp res sed) deco mpre ssor
ar e need ed to see this pictur e.
2 Predicting Gene Interactions from
information available in public databases
Weiwei Zhong