Download PPT - Stockholm Bioinformatics Center

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Tuesday, May 23, 2017
T
From high-throughput data to
network biology: gain in statistical
power and biological relevance
Stockholm Bioinformatics Centre
Andrey Alexeyenko
PLoS Med 2005 2(8):e124
Why Most Published Research Findings Are False
Statistical model: no positive facts,
and an allowed rate of Type I error
True negatives
False positives
Biological reality: negative facts are the
vast majority, positive facts are yet to be
discovered
Positive facts
True positives
Negative facts
“Positive facts”: the discoveries we are after, e.g. genomic associations,
differentially expressed genes, relations “phenotype<->disease” etc.
Network is just a graph!
The fact that I can draw a
network does not yet make
it a biological reality!..
Conversion “data pieces  confidence”
in a Bayesian framework
D. rerio, 17.3%
D. melanogaster, 9.8%
C. elegans, 9.3%
R. norvegicus, 5.1%
S. cerevisiae, 10.2%
M. musculus, 25.4%
A. thaliana, 6.5%
H. sapiens, 16.5%
A
Phylogenetic profiling, 18.6%
Protein interactions, 10.6%
Protein expression, 6.1%
T F targeting, 12.3%
miRNA targeting, 2.0%
Sub-cellular localization, 7.3%
mRNA expression, 43.1%
Enrichment of functional groups
Enrichment analysis in the networks turns to be
more powerful than on gene lists
Enrichment of functional groups
Partial correlations
rPLC = 0.95
rPLC = 0.88
rPLC = 0.76
Benjamini-Hochberg correction
Quantitative modeling of multi-component
system with mutually dependent elements
Why going “list  network” is
an advancement?
• Functional context
• “Anchoring”, i.e. interdependence
• Biological interpretability
• Statistical features
• Data integration
Many of those can be applied to the lists as
well, but mind the flexibility!
Ways to augment confidence
Trivial:
1) increase power
2) decrease false prediction rate
•
Data integration
– Evaluation prior to integration!
•
•
•
Consider biological context
Remove spurious edges
Generalize to a higher level of organization
Ways to evaluate confidence
• Supervised learning
• Balance comprehensiveness and
complexity (s.c. information criteria)
• Benjamini-Hochberg
• Show it a biologist
• Go out to the real world and test
Ways to employ confidence
•
•
•
•
Initialize network
Add node and edge attributes to the network
Filter network elements for higher relevance
Build more complex models accounting for
confidence