Download lindsay

XML Databases in BMI CSE 300 UCONN Spring 2008, CSE 300: BMI taught by: Prof. Steve Demurjian <ClinicalDocument presented by: James Lindsay xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mif="urn:hl7-org:v3/mif" xmlns="urn:hl7-org:v3"> <realmCode code="US"/> <typeId root="2.16.840.1.113883.1.3" extension="POCD_HD000040"/>  <templateId root="2.16.840.1.113883.3.117.1.1.1" />  06:00||3.0|2.16.840.1.113883^POLB_IN004410||P|I|ER|ER respondTo|RSP|tel:555-555-5555^^WP entit yRsp|||{FAM^^Hippocrates~GIV^^Harold~GIV^^H~SFXÂC^MD}|tel:555-555-5555^^WP sender|SND|nfs:127.127.127.255 device||2.16.840.1.113883.1122^GHH LAB|{GIV^Ân Entit y Name}^L|||tel:555-555-2005^^H agencyFor representedOrganization||\NOTH\ location|||2.16.840.1.113883.1122ÊLAB-3|{^^GHH Lab}^TN receiver|RCV|nfs:127.127.127.0 device|||2.16.840.1.113883.1122^GHH O E|{GIV^Ân Entit y Name}^L|||tel:555-555-2005^^H agencyFor representedOrganization|||2.16.840.1.113883.19.3.1001|{^^GHH Outpatient Clinic}^TN location|||2.16.840.1.113883.1122^BLDG4|{^^GHH Outpatient Clinic}^TN  Awkward, inflexible, unclear meaning of values. CDSS-33 HL7 V3 Specification CSE 300  Built around Reference Information Model:  Entity, Role, Participation, and Act  Utilizes dedicated vocabularites and data types.  Every specification must begin from RIM.  Clinical Document Architecture  Utilizes XML with tags like ”observation, code, value and id”. <observation classCode="OBS" moodCode="EVN"> <id root="10.23.4573.15879"/> <code code="313193002" codeSystem="2.16.840.1.113883.6.96" codeSystemName="SNOMED CT" displayName="Peak flow"/> <effectiveTime value="20000407"/> <value xsi:type="RTO_PQ_PQ"> <numerator value="260" unit="l"/> <denominator value="1" unit="min"/> </value> </observation> CDSS-34 XML in Clinical Trials CSE 300  Example: Drug studies  Utilizing XML would eliminate manual transcription when moving data from one system to another.  XML is a universal datatype as it stores everything in text.  Therefore  Clinical can handle new tech. seamlessly. Data Interchange Standards Consortium.  Industry standardization. CDSS-35 CDISC: ODM CSE 300  Operational Data Model:  XML based.  Facilitate moving data from any collection system to clinical trial sponsor.  Addresses real world issues: Incomplete data Partial data transfer Versioning and branching.  ODM 1.1 current version. CDSS-36 ODM: Layout CSE 300 CDSS-37 XML in Genomic Data CSE 300  Various groups export their data in XML  NCBI, EBI  They do not follow same schema, only allows partial semantic interoperability.  Microarray Gene Experssion Group (MAGE) publishes a schema.  MAGE files are often several gigabytes.  Illustrates overhead of XML, however researches still use it because of interoperability. CDSS-38 XML Complexity CSE 300  Clinical Genomics Special Interest Group (HL7)  Use genomic data in clinical enviroment.  Utilize several models such as MAGE, BSML (for dna seqs)  All information in raw models not necessary.  ”Bubbling up” analyzes large raw data sets, extracts useful information.  Transfer useful information to new schema / model.  Bottom line, there exists complex workflows to CDSS-39 XML BMI Issues CSE 300  Clinical information like a verbal description or advice is unstructured.  How do you query this?  Schemas and Models are extremely complex, with nesting, recursion and compound data types.  Difficult mapping to relational databases.  XML instances  What may be gigabytes in size. database solutions exist to handle such large files? CDSS-40 XML BMI Examples CSE 300  A closer  Mayo look at the Clinical Document Arch. clinic's implementation of CDA.  Case study using native XML database to facilitate research based upon clinical texts.  Tamino XML DB.  Querying native BD.  UCONN BMI, CSE 300 Spring 2008 CDSS-41 XML BMI: CDA www.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdf CSE 300  A clinical document is:  Persistence: exists for a defined time period.  Stewardship: Maintained by a designated care taker.  Potential for authentication: May be legally authenticated.  It must be human readable on a standard web browser.  Utilizes standard XML syntax CDSS-42 XML BMI: CDA www.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdf CSE 300  Mayo clinics use of CDA: CDSS-43 A Native XML Database Design for Clinical Document Research Johnson, Campbell, et. al CSE 300 Facilitate research, especially research on clinical text.  User needs to be accounted for:  Process queries against text.  Process queries against annotations.  Standard method for querying.  Non-heirachical document selection (by patient, date,...)  Return varying level of document granularity.  A schema which adapts to new information without breaking old query formulations.  A schema which adapts to new annotations.  CDSS-44 cont. CSE 300  Tamino XML DBMS: A commercial product.  Supports XQuery, text search which address many of the querying needs.  Utilizes the CDA for structuring metainformation.  A schema structures documents on sentance by sentance level.  Allows high level of granularity.  Tags to link words to sementic and vocabulary library. CDSS-45 UCONN BMI CSE 300  Utilize a native XML DB to store docuemnts.  Documents could be PHR, health data / statistics, or system meta-data (registration).  Our goal is to provide secure submission and retrieval of a variety of XML data.  For spring 2008, only focusing on submitting registration data. CDSS-46 UCONN BMI: Overview CSE 300 Current state: User HTML  Browser: HTML Form Java Server Create XML document Java Submit to DB XML Data exists in three different domains:    It is in HTML, a text datatype when the user enters it. The server maps the html to java strings to create the XML. The XML is written to a file on the server, and submitted to the database via a java API. CDSS-47 UCONN BMI: Problems CSE 300  There are 2 transformations of data.  Each requires a hand coded mapping.  This leads to sloppy code, wasted resources.  Only does XML as input, what about output?  The database is obtuse (sedna), what other options exists?  Do we want to store / transmit application data? CDSS-48 UCONN BMI: Model (potential) CSE 300 System User HTML     js -> XML Browser: HTML Form XQuery Submit to DB Java Server Java XML Utilize client side JS to create XML. Use java API to manipulate XML. Problems:  Document verified through schema, and Xquery.  Awkward to cross reference input with any other data. Advantages:  No server side data type conversion.  This model applies to user driven input and systems interactions. CDSS-49 UCONN BMI: Model retrieval CSE 300 XQuery Query User / System HTML    JS Java Server DB Java XSLT XML Client queries in XQuery or predefined query in server. Server uses API to execute XQuery to DB. Java Server is given XML document, it can:  Apply java based XSLT and return to requestor. (more reliable)  Return raw document, client side JS applies XSLT. (less server load)  Both CDSS-50 UCONN BMI: Retrival Problems CSE 300  There is still no method of performing business logic outside the scope of XSLT or XQuery.  What types of data should be retrieved in XML:  Data that does not require complex logic, like login credential validation, or registration.  Health records and data which follow a defined schema.  Education, treatment, and research information which follow a defined schema. CDSS-51 UCONN BMI: XML Future CSE 300  Focus implementing XML features on the appropriate data.  Choose an XML database which offers high reliability, and ease of use.  Develope XSLT templates for transforming XML data to appropriate format. CDSS-52 Survey of Native XML DBMS CSE 300  Comprehensive List:  http://www.rpbourret.com/xml/XMLDatabaseProds.h tm#native  Commercial:  Tamino XML Server. Well developed, supported, many tools available.  Open Source:  Sedna: Fully supports ACID, XQuery.  eXist: Great managment, documentation, indexing. CDSS-53 eXist http://www.rpbourret.com/xml/ProdsNative.htm#exist CSE 300        Proprietary data store B+ trees). Supports XQuery/XPath 2.0 Full text searches. XML:DB API. Document level concurrency. Complete documentation. Incomplete transaction support. CDSS-54 Sedna http://www.rpbourret.com/xml/ProdsNative.htm#sedna CSE 300       Underlying data storage based on DataGuide Supports XQuery/XPath 2.0 Full text searches. Custom API for various languages. Command line admin. Transaction support. CDSS-55 Questions? CSE 300  Thank you. CDSS-56 References CSE 300             “Canonical XML Version 1.0”, John Boyer. 15 March 2001. W3C “XML Path Language (Xpath) 2.0”. W3C working Draft. 2 May 2003. W3C “XML Schema”. XML Schema Working Group. 1 January 2008. W3C <http://www.w3.org/XML/Schema> “XML Schema: Formal Description” Brown, Fuchs, et. al. 25 September 2001. W3C <http://www.w3.org/TR/xmlschema-formal/> “Extensible Markup Language (XML)”. 1 January 2008. W3C <http://www.w3.org/XML/> http://www.25hoursaday.com/StoringAndQueryingXML.html http://www.nih.gov/about/director/060399.htm http://www.research.ibm.com/journal/sj/452/shabo.html “Overview of the CDISC Operational Data Model”. 26 April 2002. CDISC CDSS-57

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lindsay