Download Storing and Querying Large Interconnected HL7 V3 RIM Object Nets

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Storing and Querying Large Interconnected HL7 V3 RIM Object Nets.
SMIRFs to the rescue.
Peter Hendler MD
Kaiser Permanante
HL7 RIMBAA Co-chair
Introduction:
Large transactional instances of RIM based clinical information, such as an instance of a CDA clinical
summary document, can be stored in relational databases using Object to Relational Mapping (ORM)
tools such as Hibernate. This is the approach taken by various applications that use a RIM-based
persistence layer (e.g. the JAVA SIG reference API). All incoming RIM based documents or messages
are stored “as is”. In this paper we will use the example of a relational database with table names such
as Act, Participation, Role and Entity. This is how the current JAVA SIG persists RIM data. No pre
processing is done on the CDA documents; each document can be easily recreated from the database.
In a CDA document, the Subject of Record and various other parts of the contextual information are
established in the root Act (a.k.a. the “Entry Point”, the class that’s used as the starting point for
serialization of a model) of “ClinicalDocument”. If many CDA documents related to a given patient
are accumulated and persisted in a database, it becomes difficult to query for all of the Observations of
a given kind for a given patient. This is because the contextual information related to the Observation
may be distant from the Observation itself, and it may have been overridden somewhere between the
“Entry Point” of the instance and the Observation we are focusing on.
Entry point with original context
Original context may be overridden here
Clinical Statements
Where context may have been
overridden
Fig 1
The Problem:
The problem is that it’s hard to establish the contextual information for an Act that is contained
somewhere in a CDA document.
We can illustrate the problem using the example of a relational database system that will store multiple
CDA documents. In this example, the entry point is a child of Act called “ClinicalDocument”.
Attached to this entry point are multiple relationships that set the “context” of the entire structure (see
Figure 1, left oval). These include Record Target (i.e. Patient), Author, (Legal) Authenticator and
Custodian. In most cases this context is valid as you navigate distally from the entry point from the
entry point through many Act to ActRelationship connections. The context may apply to the Structured
Body, to Section, to Entries, as well as to the Clinical Statement Acts that carry the software
processable data.
The problem is that as we traverse this tree distally, we can change or override the context at various
points. For example, a section might be called Family History and the Subject might change from the
Patient of Record to a family member (see Figure 1, middle oval). Or perhaps, a section or entry might
be authored by a different provider. Certainly the Performer of various procedures (see Figure 1, right
oval) will not always be the same person who authored the document. The Act Relationships will also
change. The components of organizers (such as the individual tests in a lab battery like a CBC) will
not be the same, the reference ranges, the authorizations, related documents, and inFullFillmentOfs will
all be different as the tree is walked.
Assume that we have stored an entire CDA into a RIM database with table names such as Act,
Participation, Role and Entity. Now assume that we want to find all of the Glucose measurements for a
given patient with a given data range.
The Observation of the Glucose measurement (Figure 1, right oval) is an entry in the Clinical
Statement, and it is separated from the Patient ID (in most cases this would be a Medical Record
Number or Patient Id) by at least seven table joins. There are seven associations between RIM classes
to navigate from the Observation in the CDA Entry up to the Patient role associated with the
ClinicalDocument act. It's even harder than this, because we can’t just navigate to the root
ClinicalDocument. How do we know where and if the Subject of Record, Author, Performer or other
important context has been over ridden in a Section (Figure 1, middle oval)? You would have to check
each Act as you navigated towards the root to see if context had been over ridden. This, along with the
large number of joins is both time consuming and difficult.
It would be beneficial if there was a “near by” reliable structure to inform us of all the correct context.
The Solution:
Option 1: SMIRFs
In order to solve this issue we’ll use the concept of SMIRFs. A SMIRF is a “Small Isolated RIM
Fragment”. Examples include: a Clinical Statement, an Observation, an Encounter or a Substance
Administration. A “Context SMIRF” is a special kind of SMIRF in that its sole purpose is to
encapsulate the context of a SMIRF. For example: a ControlAct with an associated Author and
RecordTarget. The entire structure of a Context SMIRF is self contained and does not rely on any
connections to other structures.
Each SMIRF (core clinical data; red rectangles on the left hand side of Figure 2) is associated with
exactly one Context SMIRF (context of that data; red rectangles and associations shown on the right
hand side of Figure 2).
Top of stack is
Always current
Context
Top of Stack of
Context SMIRFs
Link from Clinical Statement
To It’s Context SMIRF
Clinical Statement (SMIRF)
Currently being processed
Previously parsed Clinical Statement (SMIRF)
Linked to a previous Context SMIRF
Bottom of stack is
Context of Entry Point
Fig 2
The use of SMIRFs and Context SMIRFs require that we perform the following processing steps upon
receipt of a CDA document:
1. We can use the classic computer science idea of a stack (First On, First Off) of Context SMIRF.
As we enter the CDA we create a Context SMIRF with all the ActRelationships and
Participations as found attached to the entry point (the ClinicalDocument class). We attach these
to a stub Act (probably a ControlAct) whose sole purpose is to have an identifier and be the
“stand in” for the Act we are parsing at the moment. We put this context SMIRF on top of a
“context stack”. Every Act we come to will be stored in the database along with a link to this
Context SMIRF that is currently on the top of the “context stack”. In figure 2 the original
Context SMIRF is the one at the bottom right of the figure.
2. Every time we parse a new Act (and I am referring to all children of Acts here), we check to see
if any context is overridden – note that the methodology described here will work regardless of
the style of Context Conduction that’s being used in the model. If it is we make a copy of the
previous Context SMIRF on the top of the stack and modify it according to the new
relationships (ActRelationships and Participations). We then put this new modified context
SMIRF on the top of the context stack. Any Act (or children of Acts) that are stored in the
database distal to this branch are linked in the database to this new Context SMIRF on the top
of the context stack. This is applied every step of the way down to the leaves of the CDA tree.
The leaves are often the individual clinical statement entries.
3. In the case where the Query first “selects” observations of a particular type, for example,
glucose measurements, we will be selecting “leaves” of the CDA tree. These leaves will not be
connected directly to their context. In other words, the Subject of Record may be attached to
the root Act of clone type “ClinicalDocument”, or the subject of record may have been changed
somewhere between the root and the leaf, for example in a section Act as illustrated by the
middle oval of Fig 2. In these cases the tree has to be navigated towards the root while
checking for context. This is both time consuming and difficult. If instead, context SMIRFs are
linked to the leaves (observations, patient encounters, substance administrations etc), then this
entire process can be avoided. The context would be found in the context SMIRF that was
linked at the time of parsing and persisting rather than leaving a difficult problem to the time of
querying.
The final result of this technique is that the entire CDA can be stored in the relational database just as
before, but with one very important improvement. Now when we want to query for all of the glucose
measurements for a given Patient ID (e.g. MRN), we know that the correct Patient Id can be found by
following the link to the Context SMIRF that was in effect when the entry was parsed.
Option 2: Decompose conducted context
There is a known alternative way of solving the issue, which is to decompose the conducted context
prior to persisting the document contents. Let’s again consider the example where the Author and
Subject participations, as well as the Act Relationship with the encompassing encounter are present in
the CDA header. They may be overridden at any point in the distal object structure.
Decomposing the conducted context requires that we perform the following processing steps upon
receipt of a CDA document:
1. Starting at the root Act (ClinicalDocument) we determine what the associations are that are
conducted to the children of that Act. Let’s call those the ‘contextual associations’.
2. Every time we parse a new Act (and I am referring to all children of Acts here), we check to see
if any contextual association is overridden. If the association is not overridden, the contextual
association will be copied (duplicated) on the new act.
3. In the case where the Query first “selects” observations of a particular type, for example,
glucose measurements, we will be selecting “leaves” of the CDA tree. These leaves will be
connected directly to their context.
Option 1 vs Option 2 By reference vs by value
Option 1 and option 2 are quite similar. In both cases you conduct the context as you are parsing and
prior to persisting the data. The difference becomes important with large structures where there are
many clinical statements (leaves) that all share similar context. In option 1, there is only one copy of
the context SMIRF in the database that is pointed to (referred to) by many individual leaves. In option
2 each individual clinical statement (leave) is directly attached to it's own copy of context (by value).
The result is that the size of the database is larger and denormalized more because there are multiple
copies of the similar context. There is redundancy in the database because for each leaf (clinical
statement) that has the same context as at least one other one, there is an “extra copy” of that context.
In option 1 this redundancy is eliminated, for any given context, there is only one copy in the database
that is pointed to by multiple leaves. Option 2 is not anymore “safe” than option 1, it is just cleaner to
implement and has less redundancy.
Conclusions
Dr Bob Dolin has expressed the concern that some of the implementations are not “safe” in that they do
not preserve correct context in some of the cases where it is overridden. Although both the above
technique can not guarantee complete safety, they can decrease the errors created by misinterpreting the
Subject of Record and other key contextual pieces of information associated with any clinical
statement.
The SMIRF option is preferable over the decomposition option because of simplicity and decreased
redundancy in the database. Both option 1 and option 2 are equivalent as far as context interpretation is
concerned. The actual context elements that are chosen to be included in a context SMIRF are a design
decision. Since the most common type of query clinically is something along the lines of “SELECT
<<Some Observation or Procedure>> WHERE MRN = 1234567” we can address the most common
safety issues simply. By assuring that the context SMIRF has the correct “Subject of Record” and
perhaps making sure it records negation indicators, and significant context such as limiting the Acts to
EVN mood, we can avoid the most likely and most dangerous errors in interpreting the clinical
statements. A “personal history of”, a “family history of” a “negated” or an alternate “subject of
record” are probably the most significant types of context errors one can make while performing this
common type of clinical query. (Note: “Personal history of” and “family history of” may also be
addressed at the vocabulary level but this is another subject. )
By making sure the context SMIRFs at a minimum take these context items into account we can
eliminate or very significantly reduce these dangerous types of context errors that might otherwise lead
to clinically relevant mistakes in interpreting clinical statements.
The SMIRF technique needs to be implemented and tested but would likely reduce the problem of
unsafe storage and querying of large nested RIM structures such as CDA documents.