Download PowerPoint 簡報

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic imprinting wikipedia , lookup

Pathogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Minimal genome wikipedia , lookup

Genome (book) wikipedia , lookup

Metagenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

NEDD9 wikipedia , lookup

Gene wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Computational discovery of gene
modules and regulatory networks
Ziv Bar-Joseph et al (2003)
Presented By: Dan Baluta
Agenda






Introduction
Goal of Paper
Methods & Results
Conclusions
Critique
Discussion
Introduction




Interest in figuring out regulatory networks
Genome-wide EXPRESSION data sets
DNA-binding data (LOCATION analysis)
PROBLEMS!!!
Theory…


Integration of expression and location data
ought to result in a more accurate
assignment of genes to regulators, when
compared to either types of data sets on
their own.
Basic result of combining data is more
information.
Primary Goal

Develop an algorithm that combines
expression and location data to discover
gene modules and regulatory networks.
Methods

GRAM Algorithm



GRAM Validation





Genetic RegulAtory Modules
As opposed to ‘GRM’ algorithm???
Comparing results
Chromatin-IP (CHIP) experiments
MIPS category enrichment analysis
DNA binding motif analysis
Targeted data analysis using GRAM

Transcriptional regulation of rapamycin response
GRAM Algorithm
Step 1: search all possible (pairwise)
combinations of transcript regulators.
Pull out sets of genes that share
binding transcriptional regulators.
STRINGENT BINDING CRITERIA
USED (p < .001).
Binding Data
Step 2: Reduce the gene sets from
Step 1, by filtering out all genes from
each set that do not have highly
(positively) correlated expression
levels. Reduced sets act as ‘seeds’ for
gene ‘modules’.
Expression Data
Step 3: Revisit binding data and adds
genes sharing binding transcriptional
regulators to gene modules from Step
2, using RELAXED BINDING
CRITERIA (p < .01).
Methods

GRAM applied to binding data for 106
transcription factors and 500 expression
experiments from Saccharomyces
cerevisiae.
List of 500 Expression Experiments
Results: GRAM Algorithm



106 gene modules found.
Containing 655 distinct genes.
Regulated by 68 transcription factors (TFs).
Results: GRAM Algorithm
(~ 35%)
Validation: Comparing Results


Picked up many more (2.5x) regulator-gene
interactions than binding data alone would
have predicted.
How do we know these are not all falsepositives?
Validation: CHIP Experiments


Allows you to determine if a given gene
actually binds to a specific TF.
Used IP experiments for Stb1 and 36
randomly chosen genes to characterize
sensitivity and specificity.
Validation: CHIP Experiments

GRAM pulled out 3 TF-gene relationships that
were…




A) Validated by the IP results.
B) NOT pulled out using binding data alone.
GRAM did NOT pull out TF-gene relationships
that were not also validated by the IP results.
IP experiments indeed showed reduction in false
negatives, and a lack of increase in false positives.
Validation: MIPS Categories


Gene modules ought to belong to same MIPS
categories.
Gene modules derived using GRAM were 3X
more likely to be enriched for genes in the same
MIPS category than groups of genes derived from
binding/location data alone.
Validation: DNA Binding Motifs


Genes linked to specific TFs ought to have the
same binding motifs upstream of them as those
associated with their TFs.
TRANSFAC database was used to determine
whether genes in GRAM modules were more
likely to be independently determined to be coregulated vs. groups of genes from binding data
alone.
Validation: DNA Binding Motifs


GRAM modules did
indeed display higher
percentage of genes
containing the
appropriate motif in
the upstream region
of DNA.
Further validation of
GRAM algorithm.
Validation: Rapamycin Response

Rapamycin inhibits Tor kinase signaling



Mimics nutrient starvation
Selected 14 TFs and performed genomewide location analysis on them
Ran GRAM algorithm using location data
plus expression data from literature
Validation: Rapamycin Response




Found 39 Gene Modules.
23 had significant MIPS category
enrichment.
Added 192 pairs of gene-TF interactions
that location data alone missed.
Generated 4 novel hypotheses.
Software Availability

Provide link to Java Application
QuickTime™
and
a
QuickTime™
and
QuickTime™ and
a a
TIFF
(LZW)
decompressor
TIFF
(LZW)
decompressor
TIFF
(LZW) see
decompressor
are needed tosee
this picture.
are
areneeded
needed to
to see this
this picture.
picture.
Conclusions


GRAM provides a means of discovering putative
regulatory networks that other data sets cannot
detect independently.
Integrating data sets provides us with more
information than is available with either set
independently.
Critiques



No solid measure of sensitivity and specificity.
Argue that GRAM is more sensitive, but without
specificity measure, how do we know that these
are not all false-positives?
Looked for positive correlations as indicative of
activation. Did not look at negatively correlated
expression -- potentially an important loss of
information.
Software does not appear to work OOB with
sample data provided.
Discussion Topics





Can this method be applied to other higher-level
organisms? Should it be?
How can this model be improved to include more
information? e.g. can we look at negatively correlated
expression data?
Should society consider other projects, on the scale of
HGP, to extract more data on organisms in a standardized
and systematic way?
Pairwise data is used in many cases in biology to infer
system-level interactions, which in reality are
multivariate. Is using this pair-wise data wise? Is there an
alternative?
Could adding multiple species sets improve our results?
i.e. Use metagenes instead of genes?
The End