Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
XBrain XQuerying the Brain Mapping Database Stacy Tang, Yana Kadiyska Jim Brinkley, Dan Suciu 1 Human Brain Project • Problem – Explosion of information due to proliferation of techniques. • NIH Goal – WWW based information tools that allow management, integration, and sharing of research data. 2 Brain Mapping Database • Study of language function through invasive neurosurgical method, Cortical Stimulation Mapping • Combined with non-invasive methods such as MRI, fMRI, PET scan • 64 patients with 13 of them published 3 XML • Document markup language • Become the standard for data exchange between inter-enterprise applications • Platform independent • “Self-describing” data 4 XML example <bib> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year=“1992”>…</book> ... </bib> 5 SilkRoute • Data stored in relational database, how to translate to XML for exchange. • Is a tool for publishing relational data in XML. • Allows querying of the data using XQuery. • Developed by Dan and Yana, along with collaborators from other institutions. 6 What SilkRoute Does Public Query User Input: User Query Relational Schema SilkRoute SQL Output: XML tuples RDBMS 7 Objective of Project To demonstrate the usability of SilkRoute and XQuery for data sharing by applying it to a real relational database -- the Brain Mapping database. 8 Project Background • Started as a CSE544 (Intro to Database) project (Spring 2002 Quarter). • Original project members: Hao Li and myself. • Demonstrated feasibility of project. • Unfinished: – Covered small part of database – Depended on manual tweaking of data – Minimal web interface 9 Tasks of the Project 1. Migrate database from MySQL to PostgreSQL - automate as much as possible. 2. Complete XQuery-based public view for the entire database. 3. Work with Yana to smooth out SilkRoute issues - bug fixes, error handling, etc. 4. Web interface - add new features, improve look and feel, improve UI. 10 1.MySQL to PostgreSQL • Why is this necessary? – Robustness – Sub-select queries • Problems: MySQL and PostgreSQL are very different, and the data needs to be cleaned up. • The previous process involved too much manual tweaking, need to improve. Wrote scripts for this. 11 MySQL to PostgreSQL - Step 1 Make a dump of the MySQL database - MySQL database is on tela.biostr - Use a perl script to create a dump in a specified directory. 12 MySQL to PostgreSQL - Step 2 Translate MySQL dump to PostgreSQL. Use scripts to: - clean up syntax - rename table/column names that are reserved words (user, public) in PostgreSQL. - designate primary keys when lacking - get rid of WIRM related tables 13 MySQL to PostgreSQL - Step 3 Create SQL files for running later (generated using python scripts). The SQL files: - correct some of the bad data - add foreign key constraints (lacking in the MySQL dump) 14 MySQL to PostgreSQL - Step 4 Import the data into PostgresSQL - run the dump and generated SQL files in a specific order to allow the data to be entered - reorder the insert statements as to not violate foreign key constraints - still errors about bad rows, those aren’t inserted 15 2. The Public View • Provides a virtual view of the relational database • Very large (over 1000 lines) • Data Privacy – Choose not to publish some fields. – Protect patient privacy, e.g. patient.initials, patient.research_num, etc. – Protect unpublished research data. • How to translate graph to tree – DB tables may not be hierarchical, so have to force parent-child relationships for the DTD. 16 Brain Mapping DB – Schema Patient(*oid,initials,first_name,last_name,location,registered,age,sex,viq,pnum, is_public,handedness,wada,size,copy,pre,description,gao_research_num); Surgery(*oid,patient,surgery_date,surgeon,diagnosis,side,lobe,grid); CSMStudy(*oid,surgery,function,trial_data,site_data); File(*oid,label,domain,locator,source,mime_type,submit_date,submitted_by, version,context,description); Photo(*oid,preference,image,csmstudy,image_pathname,image_filename); StimSite(*oid,site_label,zone,lobe,csmstudy,anatomical_name); Trial(*oid,trial_num,site_label,trial_time,current,slide,eeg_score,miriam_code, confidence,comments,km_score,site_suffix,csmstudy,stimulation_site); UserPerson(*oid,login,first_name,last_name,email,password,user_group); 17 Brain Mapping DB – Schema (cont) SiteToAnatomyMap(*oid,csmstudy,photo,scene,author,map_date, sitetoanatomyfile,rendered_map,sitetoanatomy_pathname, sitetoanatomy_filename, preference,modtime); SiteToAnatomyMapElement(*oid,sitetoanatomymap,stimsite,site_label, ant_coord,sup_coord,right_coord,x,y,confidence); Scene(*oid,imaging_study,description,description_file,preference, ismapscene); ImagingStudy(*oid,patient,image_date,billed,prefix,subject,suffix, computed_image_pathname,computed_image_filename, computed_coords_pathname,computed_coords_filename, lowres_surface_pathname,lowres_surface_filename,aligned_pathname); MRExam(*oid,imaging_study,exam_num,description,import_date, import_info,location); Rendering(*oid,rendering_type,preference,image,scene,image_pathname, 18 image_filename); Brain Mapping DB – Schema (cont) SceneComponent(*oid,scene,description,surface_model,volume); SurfaceModel(*oid,volume,model_instance,format,model_file, model_pathname,model_filename,preference); RadialSliceModelInstance(*oid,volume,model,landmarks_file,instance_file, expansion_factor,instance_pathname,instance_filename,preference, landmarks_pathname,landmarks_filename,derived_from); RadialSliceModel(*oid,pathname,filename,comment,theta_radials,slices, training_set,model_file,preference); MRSeries(*oid,mrexam,location,showing,total_images,plane,scan_start, scan_end,psd,type,description,fov_x,fov_y,height,width,bytes_per_pixel, bits_per_pixel,optical_disk,start_img,stop_img,threshold,tissue,first,last, label,thickness,spacing); MRSlice(*oid,sequence_num,image_file,mrseries); 19 AlignedVolume(*oid,series,format,volume_file,filename,tissue,patient); Brain Mapping DB – Schema Diagram Patient Surgery ImagingStudy 20 Brain Mapping DB – Schema Diagram (cont) 21 Brain Mapping DB – Schema Diagram (cont) 22 The Public View – DTD Graph 23 The Public View – DTD Graph (cont) 24 The Public View - In XQuery <root> { for $patient in $cv/Patient where $patient/is_public/text() = "1" return <patient oid="{$patient/oid/text()}"> <first_name> xxx </first_name> <last_name> xxx </last_name> <location> {$patient/location/text()} </location> <sex> {$patient/sex/text()} </sex> ... { for $surgery in $cv/Surgery where data($surgery/patient) = data($patient/oid) return <surgery oid="{$surgery/oid/text()}"> <diagnosis> {$surgery/diagnosis/text()} </diagnosis> … 25 User Queries - A Simple Example Sample Query 1 (written in XQuery): List the last names of all patients who DID NOT have surgery. <results> { for $p in $pv/root/patient where empty($p/surgery) return <last_name>{$p/last_name/text()}</last_name> } </results> 26 User Queries - A Simple Example Alternative (written in XPath): XPath is a subset of the XQuery language, and thus perfectly acceptable to use for queries. You can’t do as much with XPath, but it is very simple to write. <results> { $pv/root/patient[empty(surgery)]/last_name } </results> 27 User Queries - A Simple Example (cont) Sample Query 1: Intermediate SQL query generated by SilkRoute. SELECT P78.last_name, P78.oid FROM Patient as P78 WHERE NOT EXISTS ( SELECT * FROM Surgery as S99 WHERE S99.patient = P78.oid); 28 User Queries - A Simple Example (cont) Results (in XML): <results> <last_name>Chopra</last_name> <last_name>Townes</last_name> </results> 29 3. Improve “plumbing” between SilkRoute and web application Worked with Yana to improve error handling. – If user inputs bad query, then return the parse error back to the user. – When SilkRoute encounters an error, gracefully exit instead of bringing down the web page. 30 4. Web Interface • Located at: http://quad.biostr.washington.edu:8080/xbrain/ • • • • Make application available over web. Written in JSP and served by Tomcat. Talks to SilkRoute through a Java interface. Allows users to input their own queries and get XML results. • Added feature for letting certain “super” users to access a version of the public view that contains all the patients (not just the 13 31 public ones). Web Interface - System Diagram quad.biostr.washington.edu SilkRoute Postgres Tomcat4 XML MySQL XQuery Web Browser (Internet) 32 Web Interface - System Architecture Tomcat (Application Server) Runs JSP/Servlets XBrain pages JSPs Java API Java Classes SilkRoute DB 33 Web Interface - Screen Shots 34 Web Interface - Screen Shots 35 Web Interface - Screen Shots 36 Web Interface - Screen Shots 37 Web Interface - Screen Shots 38 Web Interface - Screen Shots 39 Web Interface - Screen Shots 40 Current Status & Future Work • Currently, the website is up and running at http://quad.biostr.washington.edu:8080/xbrain/ • Immediate Future – Figure out who the super users are by looking in the “UserPerson” table. – Store user input in temporary files, to better handle simultaneous users. – Add Secure Socket Layer (SSL) to ensure secure transfer of XML data when user is logged in. 41 – SilkRoute bug fixes. Future Work • Future: – Graphical User Interface to help users formulate user queries. – Flexible format for visualizing results (i.e., comma separated values instead of XML). – Extend this to other databases. – Eventual goal of allowing multiple applications to cooperate in a peer data management system. 42 Team/Resources • SilkRoute support: Yana • Faculty: Dan Suciu, Jim Brinkley 43 Questions? For more information, go to the XBrain webpage: http://quad.biostr.washington.edu:8080/xbrain/ 44