Download XBrain

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

PostgreSQL wikipedia , lookup

Database model wikipedia , lookup

Transcript
XBrain
XQuerying the Brain Mapping Database
Stacy Tang, Yana Kadiyska
Jim Brinkley, Dan Suciu
1
Human Brain Project
• Problem
– Explosion of information due to proliferation of
techniques.
• NIH Goal
– WWW based information tools that allow
management, integration, and sharing of
research data.
2
Brain Mapping Database
• Study of language function through invasive
neurosurgical method, Cortical Stimulation
Mapping
• Combined with non-invasive methods such
as MRI, fMRI, PET scan
• 64 patients with 13 of them published
3
XML
• Document markup language
• Become the standard for data exchange
between inter-enterprise applications
• Platform independent
• “Self-describing” data
4
XML example
<bib>
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
<book year=“1992”>…</book>
...
</bib>
5
SilkRoute
• Data stored in relational database, how to
translate to XML for exchange.
• Is a tool for publishing relational data in
XML.
• Allows querying of the data using XQuery.
• Developed by Dan and Yana, along with
collaborators from other institutions.
6
What SilkRoute Does
Public
Query
User Input:
User Query
Relational
Schema
SilkRoute
SQL
Output:
XML
tuples
RDBMS
7
Objective of Project
To demonstrate the usability of SilkRoute
and XQuery for data sharing by applying it to
a real relational database -- the Brain Mapping
database.
8
Project Background
• Started as a CSE544 (Intro to Database)
project (Spring 2002 Quarter).
• Original project members: Hao Li and
myself.
• Demonstrated feasibility of project.
• Unfinished:
– Covered small part of database
– Depended on manual tweaking of data
– Minimal web interface
9
Tasks of the Project
1. Migrate database from MySQL to
PostgreSQL - automate as much as possible.
2. Complete XQuery-based public view for
the entire database.
3. Work with Yana to smooth out SilkRoute
issues - bug fixes, error handling, etc.
4. Web interface - add new features, improve
look and feel, improve UI.
10
1.MySQL to PostgreSQL
• Why is this necessary?
– Robustness
– Sub-select queries
• Problems: MySQL and PostgreSQL are
very different, and the data needs to be
cleaned up.
• The previous process involved too much
manual tweaking, need to improve. Wrote
scripts for this.
11
MySQL to PostgreSQL - Step 1
Make a dump of the MySQL database
- MySQL database is on tela.biostr
- Use a perl script to create a dump in a specified
directory.
12
MySQL to PostgreSQL - Step 2
Translate MySQL dump to PostgreSQL. Use
scripts to:
- clean up syntax
- rename table/column names that are reserved words
(user, public) in PostgreSQL.
- designate primary keys when lacking
- get rid of WIRM related tables
13
MySQL to PostgreSQL - Step 3
Create SQL files for running later (generated using
python scripts). The SQL files:
- correct some of the bad data
- add foreign key constraints (lacking in the MySQL
dump)
14
MySQL to PostgreSQL - Step 4
Import the data into PostgresSQL
- run the dump and generated SQL files in a specific
order to allow the data to be entered
- reorder the insert statements as to not violate foreign
key constraints
- still errors about bad rows, those aren’t inserted
15
2. The Public View
• Provides a virtual view of the relational
database
• Very large (over 1000 lines)
• Data Privacy
– Choose not to publish some fields.
– Protect patient privacy, e.g. patient.initials,
patient.research_num, etc.
– Protect unpublished research data.
• How to translate graph to tree
– DB tables may not be hierarchical, so have to
force parent-child relationships for the DTD.
16
Brain Mapping DB – Schema
Patient(*oid,initials,first_name,last_name,location,registered,age,sex,viq,pnum,
is_public,handedness,wada,size,copy,pre,description,gao_research_num);
Surgery(*oid,patient,surgery_date,surgeon,diagnosis,side,lobe,grid);
CSMStudy(*oid,surgery,function,trial_data,site_data);
File(*oid,label,domain,locator,source,mime_type,submit_date,submitted_by,
version,context,description);
Photo(*oid,preference,image,csmstudy,image_pathname,image_filename);
StimSite(*oid,site_label,zone,lobe,csmstudy,anatomical_name);
Trial(*oid,trial_num,site_label,trial_time,current,slide,eeg_score,miriam_code,
confidence,comments,km_score,site_suffix,csmstudy,stimulation_site);
UserPerson(*oid,login,first_name,last_name,email,password,user_group);
17
Brain Mapping DB – Schema (cont)
SiteToAnatomyMap(*oid,csmstudy,photo,scene,author,map_date,
sitetoanatomyfile,rendered_map,sitetoanatomy_pathname,
sitetoanatomy_filename, preference,modtime);
SiteToAnatomyMapElement(*oid,sitetoanatomymap,stimsite,site_label,
ant_coord,sup_coord,right_coord,x,y,confidence);
Scene(*oid,imaging_study,description,description_file,preference,
ismapscene);
ImagingStudy(*oid,patient,image_date,billed,prefix,subject,suffix,
computed_image_pathname,computed_image_filename,
computed_coords_pathname,computed_coords_filename,
lowres_surface_pathname,lowres_surface_filename,aligned_pathname);
MRExam(*oid,imaging_study,exam_num,description,import_date,
import_info,location);
Rendering(*oid,rendering_type,preference,image,scene,image_pathname, 18
image_filename);
Brain Mapping DB – Schema (cont)
SceneComponent(*oid,scene,description,surface_model,volume);
SurfaceModel(*oid,volume,model_instance,format,model_file,
model_pathname,model_filename,preference);
RadialSliceModelInstance(*oid,volume,model,landmarks_file,instance_file,
expansion_factor,instance_pathname,instance_filename,preference,
landmarks_pathname,landmarks_filename,derived_from);
RadialSliceModel(*oid,pathname,filename,comment,theta_radials,slices,
training_set,model_file,preference);
MRSeries(*oid,mrexam,location,showing,total_images,plane,scan_start,
scan_end,psd,type,description,fov_x,fov_y,height,width,bytes_per_pixel,
bits_per_pixel,optical_disk,start_img,stop_img,threshold,tissue,first,last,
label,thickness,spacing);
MRSlice(*oid,sequence_num,image_file,mrseries);
19
AlignedVolume(*oid,series,format,volume_file,filename,tissue,patient);
Brain Mapping DB – Schema Diagram
Patient
Surgery
ImagingStudy
20
Brain Mapping DB – Schema Diagram (cont)
21
Brain Mapping DB – Schema Diagram (cont)
22
The Public View – DTD Graph
23
The Public View – DTD Graph (cont)
24
The Public View - In XQuery
<root>
{
for $patient in $cv/Patient
where $patient/is_public/text() = "1"
return
<patient oid="{$patient/oid/text()}">
<first_name> xxx </first_name>
<last_name> xxx </last_name>
<location> {$patient/location/text()} </location>
<sex> {$patient/sex/text()} </sex>
...
{
for $surgery in $cv/Surgery
where data($surgery/patient) = data($patient/oid)
return
<surgery oid="{$surgery/oid/text()}">
<diagnosis> {$surgery/diagnosis/text()} </diagnosis>
…
25
User Queries - A Simple Example
Sample Query 1 (written in XQuery):
List the last names of all patients who DID NOT have
surgery.
<results>
{
for $p in $pv/root/patient
where empty($p/surgery)
return
<last_name>{$p/last_name/text()}</last_name>
}
</results>
26
User Queries - A Simple Example
Alternative (written in XPath):
XPath is a subset of the XQuery language, and thus
perfectly acceptable to use for queries. You can’t do as
much with XPath, but it is very simple to write.
<results>
{
$pv/root/patient[empty(surgery)]/last_name
}
</results>
27
User Queries - A Simple Example (cont)
Sample Query 1:
Intermediate SQL query generated by SilkRoute.
SELECT P78.last_name, P78.oid
FROM Patient as P78
WHERE NOT EXISTS
( SELECT *
FROM Surgery as S99
WHERE S99.patient = P78.oid);
28
User Queries - A Simple Example (cont)
Results (in XML):
<results>
<last_name>Chopra</last_name>
<last_name>Townes</last_name>
</results>
29
3. Improve “plumbing” between
SilkRoute and web application
Worked with Yana to improve error handling.
– If user inputs bad query, then return the parse
error back to the user.
– When SilkRoute encounters an error, gracefully
exit instead of bringing down the web page.
30
4. Web Interface
• Located at:
http://quad.biostr.washington.edu:8080/xbrain/
•
•
•
•
Make application available over web.
Written in JSP and served by Tomcat.
Talks to SilkRoute through a Java interface.
Allows users to input their own queries and
get XML results.
• Added feature for letting certain “super”
users to access a version of the public view
that contains all the patients (not just the 13
31
public ones).
Web Interface - System Diagram
quad.biostr.washington.edu
SilkRoute
Postgres
Tomcat4
XML
MySQL
XQuery
Web Browser
(Internet)
32
Web Interface - System Architecture
Tomcat (Application Server) Runs JSP/Servlets
XBrain pages
JSPs
Java API
Java
Classes
SilkRoute
DB
33
Web Interface - Screen Shots
34
Web Interface - Screen Shots
35
Web Interface - Screen Shots
36
Web Interface - Screen Shots
37
Web Interface - Screen Shots
38
Web Interface - Screen Shots
39
Web Interface - Screen Shots
40
Current Status & Future Work
• Currently, the website is up and running at
http://quad.biostr.washington.edu:8080/xbrain/
• Immediate Future
– Figure out who the super users are by looking in
the “UserPerson” table.
– Store user input in temporary files, to better
handle simultaneous users.
– Add Secure Socket Layer (SSL) to ensure
secure transfer of XML data when user is
logged in.
41
– SilkRoute bug fixes.
Future Work
• Future:
– Graphical User Interface to help users formulate
user queries.
– Flexible format for visualizing results (i.e.,
comma separated values instead of XML).
– Extend this to other databases.
– Eventual goal of allowing multiple applications
to cooperate in a peer data management system.
42
Team/Resources
• SilkRoute support: Yana
• Faculty: Dan Suciu, Jim Brinkley
43
Questions?
For more information, go to the XBrain webpage:
http://quad.biostr.washington.edu:8080/xbrain/
44