Download Iterative literature searching

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metabolic network modelling wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Essential gene wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Minimal genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Public health genomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression programming wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Overview
Introduction
Biological network data
Text mining
Gene Ontology
 Expression data basics




 Expression, text mining, and GO
 Modules and complexes
 Domains and conclusion
Scenario
 Ran a set of expression experiments to
study a given disease state.
 Need to put the results into a
functional context.
Atherosclerosis
 Most common fatal disease in the U.S.,
and not well-understood.
Microarray analysis
 Analyzed 51 artery segments from the hearts from
22 heart transplant patients.
 Classified segments by their disease pathology.
 Will assess the differences between Type I (moderate) and
Type V (severe) atherosclerosis.
 Performed microarray analysis of each segment.
 Agilent expression array with probesets for 13,000 human
genes.
SAM microarray statistic
 For each gene i, contrasted expression in Type I and
Type V lesions with SAM (Proc Natl Acad Sci USA 98:
5116-21, 2001).
 High positive SAM score: gene expressed more highly in
Type V lesions.
 Large negative SAM score: gene expressed more highly in
Type I lesions.
Analysis pipeline
1.
Biomarker
identification
For formal studies,
use machine learning
methods
 For exploratory work,
select several genes
with extreme SAM
scores.

Analysis pipeline, continued
2. Biomarker association
 Basic question: for this context, what is
common among the biomarker genes?
 Approaches



Exhaustive reading
GO analysis
Literature searching
pros and cons of this
approach
 Pro: associations are
 Con: might not find
specific to this
disease context
 Pro: identifies
relevant literature
associations on all
of your biomarkers
 Con: might find
associations on
other genes
Iterative literature searching
 Perform an initial search
 Color the network by SAM d-score
 Identify any new “responsive” genes
 Add to biomarker list
 Repeat
Discussion topic
Why not use all genes with extreme SAM
scores as biomarkers? Why iterate?
Once you have a good
network:
1.
2.
3.
4.
5.
Use BiNGO to identify the enriched GO
terms
Look at the genes corresponding to
selected enriched terms
Check the literature search sentences for
those genes
Choose one or two sentences, look at the
abstracts.
Iterate if desired (or go to lunch)
Final points
 No right or wrong answers, only
plausible or novel hypotheses.
 You can take any approach you wish.
 “If it was easy, everyone would be
doing it”.