Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Swan River foreshore, Perth, Western Australia University of Western Australia Biomedical, Biomolecular and Chemical Sciences ARC Centre Plant Energy Biology Ian Small Murray Badger Steve Smith David Day Barry Pogson Harvey Millar Jim Whelan SUBA SUBcellular location database for Arabidopsis proteins Sandra Tanz and Ian Castleden 4th March 2011 Why protein localisation? • Contributes towards the understanding of protein function and of biological inter-relationships, i.e. only proteins in the same location can interact. • Separate subcellular locations often represent distinct cellular environments: proteins share similar attributes and play roles in defining the function of a subcellular compartment. • To build hypotheses or models: large-scale phenotyping screens, microarray experiments and protein-protein interaction assays rely on protein localisation info. How to localise proteins? Prediction In vitro uptake (imports) Western blot Immunogold labeling Images modified from Millar et al., 2009 In vivo (GFP) Subcellular proteomics (MS) Enzyme activity measurements Protein-protein interaction SUBA: SUBcellular location database for Arabidopsis proteins SUBA: SUBcellular location database for Arabidopsis proteins What does SUBA document? SUBA II (2007) SUBA III (2011) Combined sub-location data 250’719 1’022’040 Bioinformatic predictions by 10 predictors 24 predictors Calls by experiments (GFP, MS) 8273 19’528 Calls by PPI 0 6673 Distinct proteins localised by GFP and/or MS 4531 8533 GFP (2135) NEW! MS (6398) 1193 5456 942 Data mining • Search of the NCBI PubMed (Medline) and Entrez (GenBank) databases using keywords • Alert via Email Data mining • Search publication to extract localisation information = fully curated data SUBA III interface http://suba.plantenergy.uwa.edu.au/ SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III interface SUBA III flatfile Analysis of SUBA III data – on the way… Do data become more or less consistent over time? Experimental data (MS vs GFP) • How reliable are experimental localisation data? Has the overlap of data changed with increasing data sets? How reliable are GFP localisation data? Total GFP localisations confirmed by MS GFP (2554) 1844 MS (9016) 8306 710 Total GFP localisations disputed by MS GFP (1844) 1386 MS (74172) 73714 458 1386 neither confirmed or disputed Analysis of SUBA III data – on the way… Do data become more or less consistent over time? Experimental data (MS vs GFP) • How reliable are experimental localisation data? Has the overlap of data changed with increasing data sets? • Does evidence for multiple locations mean the protein is dual targeted/dynamic or is it a false positive? Prediction vs experimental data • How reliable are predictors today? PPI data • What do PPI data tell us about sub-cellular location? • Organellar proteome: Can we discover novel organellar proteins? http://library.duke.edu/digitalcollections/gedney.KY0180/pg.1/ SUBA under the hood • • • • Why a Web interface? GeneInvestigator, Mapman AHM chemicals (Apache JPA) For the foreseeable future databases are going to be “Web” based (HTTP, Javascript, HTML ,css) • Need to be maintained by a minimum number of developers (i.e. one!) http://www.guistuff.com/ SUBA Tables (predictors) SUBA Tables (“original” sources) http://www.ce4csb.org/amigo/ Suba Tables (publications) http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&r etmode=xml&id=18453549 SUBA Tables (automation) Julian Tonti-Filippini Why Bother? Suba2 Suba3.ppi.locusB IN (‘AT3G62420.1’) “denormalisation” src_msms SELECT suba3.suba3.*, suba3.src_ppi_1.* FROM suba3.suba3 LEFT OUTER JOIN suba3.src_ppi AS src_ppi_1 ON suba3.suba3.locus = src_ppi_1.`locusA` WHERE EXISTS (SELECT 1 FROM suba3.src_ppi WHERE suba3.suba3.locus = suba3.src_ppi.`locusA` AND suba3.src_ppi.`locusB` IN (‘AT3G62420.1’)) Suzanne M. Embury and Peter M.D. Gray http://suba.plantenergy.uwa.edu.au/cgi/suba.py/query?filter=['Suba3.ppi.locusB','in',['AT1G04234.1'],'AND','mw t','gt',80000.0]&offset=0&limit=1000 @suba.json def query(filter,offset=0,limit=1000): return Session().query(Suba3).filter(json2sqla(filter))\ .offset(offset).limit(limit) {success: True, result:[ { locus:’AT1G54321.1’, mwt:81454, …. ppi:[{locusA:’AT1G54321.1’,locusB:’AT1G04234.1’,pubmed:14567845}] }, { locus:’AT1G63021.1’, mwt:91454, …. ppi:[{locusA:’ AT1G63021.1’,locusB:’AT1G04234.1’ ,pubmed:34567767}] }, …]} Computational Systems Biology (Near) Future • Large number of predictors often given conflicting predictions… what to do? • Bayesian analysis… Computational Systems Biology Acknowledgements Thanks for your attention!! Ian Small Harvey Millar Joshua Heazlewood Julian Tonti-Fillipini