Download 110304 Visit IPK Gatersleben SUBAIII v3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Swan River foreshore, Perth, Western Australia
University of Western Australia
Biomedical, Biomolecular and
Chemical Sciences
ARC Centre Plant Energy Biology
Ian Small
Murray Badger
Steve Smith
David Day
Barry Pogson
Harvey Millar
Jim Whelan
SUBA
SUBcellular location database
for Arabidopsis proteins
Sandra Tanz and Ian Castleden
4th March 2011
Why protein localisation?
• Contributes towards the understanding of protein function and of
biological inter-relationships, i.e. only proteins in the same location can
interact.
• Separate subcellular locations often represent distinct cellular
environments: proteins share similar attributes and play roles in defining
the function of a subcellular compartment.
• To build hypotheses or models: large-scale phenotyping screens,
microarray experiments and protein-protein interaction assays rely on
protein localisation info.
How to localise proteins?
Prediction
In vitro uptake
(imports)
Western blot
Immunogold labeling
Images modified from Millar et al., 2009
In vivo (GFP)
Subcellular proteomics
(MS)
Enzyme activity
measurements
Protein-protein
interaction
SUBA: SUBcellular location database for Arabidopsis proteins
SUBA: SUBcellular location database for Arabidopsis proteins
What does SUBA document?
SUBA II (2007)
SUBA III (2011)
Combined sub-location data
250’719
1’022’040
Bioinformatic predictions by
10 predictors
24 predictors
Calls by experiments (GFP, MS)
8273
19’528
Calls by PPI
0
6673
Distinct proteins localised
by GFP and/or MS
4531
8533
GFP (2135)
NEW!
MS (6398)
1193
5456
942
Data mining
• Search of the NCBI PubMed
(Medline) and Entrez
(GenBank) databases using
keywords
• Alert via Email
Data mining
• Search publication to extract
localisation information = fully
curated data
SUBA III interface
http://suba.plantenergy.uwa.edu.au/
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III flatfile
Analysis of SUBA III data – on the way…
Do data become more or less consistent over time?
Experimental data (MS vs GFP)
• How reliable are experimental localisation data? Has the overlap of
data changed with increasing data sets?
How reliable are GFP localisation data?
Total GFP localisations
confirmed by MS
GFP (2554)
1844
MS (9016)
8306
710
Total GFP localisations
disputed by MS
GFP (1844)
1386
MS (74172)
73714
458
1386 neither confirmed or disputed
Analysis of SUBA III data – on the way…
Do data become more or less consistent over time?
Experimental data (MS vs GFP)
• How reliable are experimental localisation data? Has the overlap of
data changed with increasing data sets?
• Does evidence for multiple locations mean the protein is dual
targeted/dynamic or is it a false positive?
Prediction vs experimental data
• How reliable are predictors today?
PPI data
• What do PPI data tell us about sub-cellular location?
• Organellar proteome: Can we discover novel organellar proteins?
http://library.duke.edu/digitalcollections/gedney.KY0180/pg.1/
SUBA under the hood
•
•
•
•
Why a Web interface?
GeneInvestigator, Mapman
AHM chemicals (Apache JPA)
For the foreseeable future databases are going to be
“Web” based (HTTP, Javascript, HTML ,css)
• Need to be maintained by a minimum number of
developers (i.e. one!)
http://www.guistuff.com/
SUBA Tables (predictors)
SUBA Tables (“original”
sources) http://www.ce4csb.org/amigo/
Suba Tables (publications)
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&r
etmode=xml&id=18453549
SUBA Tables (automation)
Julian Tonti-Filippini
Why Bother?
Suba2
Suba3.ppi.locusB IN (‘AT3G62420.1’)
“denormalisation” src_msms
SELECT suba3.suba3.*, suba3.src_ppi_1.* FROM suba3.suba3 LEFT
OUTER JOIN suba3.src_ppi AS src_ppi_1 ON suba3.suba3.locus =
src_ppi_1.`locusA` WHERE EXISTS (SELECT 1 FROM suba3.src_ppi
WHERE suba3.suba3.locus = suba3.src_ppi.`locusA` AND
suba3.src_ppi.`locusB` IN (‘AT3G62420.1’))
Suzanne M. Embury and Peter M.D. Gray
http://suba.plantenergy.uwa.edu.au/cgi/suba.py/query?filter=['Suba3.ppi.locusB','in',['AT1G04234.1'],'AND','mw
t','gt',80000.0]&offset=0&limit=1000
@suba.json
def query(filter,offset=0,limit=1000):
return Session().query(Suba3).filter(json2sqla(filter))\
.offset(offset).limit(limit)
{success: True,
result:[
{ locus:’AT1G54321.1’,
mwt:81454,
….
ppi:[{locusA:’AT1G54321.1’,locusB:’AT1G04234.1’,pubmed:14567845}]
},
{ locus:’AT1G63021.1’,
mwt:91454,
….
ppi:[{locusA:’ AT1G63021.1’,locusB:’AT1G04234.1’ ,pubmed:34567767}]
},
…]}
Computational
Systems Biology
(Near) Future
• Large number of predictors often given conflicting predictions… what to do?
• Bayesian analysis…
Computational
Systems Biology
Acknowledgements
Thanks for your attention!!
Ian Small
Harvey Millar
Joshua Heazlewood
Julian Tonti-Fillipini