Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discussion of Architectural Issues Related to the ‘primary’ Exchange and Storage Format of HL7 v3 MIF-based Artifacts Len Gallagher Draft distributed to Tooling Committee June 28, 2004 Background During the past several Tooling Committee meetings, we have derived a list of “Requirements” and a list of “Nice-to-Have” features related to the exchange and storage format of HL7 v3 MIFbased artifacts. Given agreement on format and storage mechanism, the Tooling Committee then intends to choose a ‘primary’ format and a ‘primary’ exchange mechanism that best satisfies the requirements and desired features. During the discussion we did not try to limit the desired features strictly to just formats; instead, we considered the combination of the pure formats along with basic tools that might create, modify, combine and extract artifact representations from the ‘primary’ exchange and storage format chosen. The most recent version of this Architectural Decisions document was distributed by Lloyd McKenzie to the Tooling Committee email list on June 19, 2004. An Issue I think we are having a number of communication problems because each of us is thinking of requirements and features in terms of exchange and storage formats that we’ve already had some experience with. My own experience is with a centralized repository with access to objects in the repository as a web service; Geoffry is most familiar with the CVS tool and storage he has set up for HL7 use and is probably viewing requirements and features in terms of what that tool, and its client side Eclipse tool, can do; Woody is certainly familiar with the Access database storage and exchange mechanism that we’ve been using for definition and exchange of static models up to this point; Lloyd is most familiar with the XML structures of the MIF schema and the tools he has available to him for managing those types of structures. Others are certainly viewing requirements and features through their own experienced eyes. Unfortunately, I think that our interpretations of requirements and features are quite different, because of our different experiences, so that even when we agree on a requirement, or a desirable feature, we’re not really agreeing on exactly what is meant. I know that I’m evaluating requirements and features against what I know we can do, or not do, in the NIST sponsored HL7 Artifact Registry, and I’m continually surprised when someone else is interpreting that same requirement or feature quite differently than I am. Abstract This paper addresses generic properties of MIF-based artifact definitions such as updatability constraints, merging and packaging rules, deconstruction into smaller MIF-based artifacts, etc., as well as transformations into representations other than the ‘primary’ storage format. It attempts to address each of the requirements and features we’ve already accepted against some of the options being considered as the ‘primary’ exchange and storage format. In many cases I consider interpretations of the requirements and features against the storage and exchange mechanism that I’m most familiar with, i.e. a centralized database repository of MIF-based artifacts that maintains a clear distinction between the document that is registered (i.e. the MIF 1 representation of the artifact) and the various metadata that helps to identify, describe, categorize and associate a single artifact or associations among artifacts. I consider both the advantages and disadvantages of universal web services for all access to the MIF-based registry/repository. Introduction Begin by assuming the existence of a primary source MIF-based representation of each registered HL7 artifact. The primary source MIF is the unique TC-approved current definition of the artifact and must be kept distinguished from all other MIF-based representations of that artifact. Other MIFs may be copies of the primary source MIF, with or without some added restrictions, or may have one or more primary source MIFs embedded within it, but there is only one primary source MIF for each artifact and both tool builders and tool users must know when they are working on a primary source MIF and when they are working on MIFs that are copies or other derivations of the primary source MIF. The MIF schema is very flexible, so an artifact may be represented in an equivalent manner by several different non-identical MIFs. For example, the MIF schema allows one to reference another artifact or to import its definition into the current MIF definition. Importation of other MIFs is very helpful for readability, but makes it difficult to update a MIF if it’s not clear which parts are source material and which parts are imported from other primary sources. For this reason a primary source MIF should follow a few very simple rules: 1. A primary source MIF may include other primary source MIFs only by reference. It may not copy the content of another primary source MIF into itself and retain its status as the primary source MIF representation of the underlying artifact. For example, a primary source MIF may include MIF definitions of data types, classes, associations, triggers, interactions and vocabulary only if there do not exist other HL7-identified primary source MIFs for those artifacts. 2. A primary source MIF that is derived from another primary source MIF may not copy annotations or history items from that artifact into itself unless they are clearly marked as copied from that other MIF and not editable in this MIF. If definitions or constraints are copied verbatim, e.g. from an HMD primary source to a message type primary source, then in practice they must be treated as two separate definitions or constraints, one for the HMD and one for the MessageType. An edit of either definition or constraint will not automatically flow to the other. A constraint on the parent model could invalidate the specification of a descendant model unless one rewrites the descendant to honor the tighter restrictions of the parent. 3. If a primary source MIF references a primary source vocabulary value set, then it may not re-define those vocabulary items within itself. Instead, it could copy the codes into an annotation or description that is clearly marked as derived from another source and not editable. If instead, the MIF re-defines the value set, it must be treated as a new and distinct value set subject to all constraint consistency rules between a MIF and the parent MIF it is derived from. At some point HL7 should define an appropriate granularity for primary source MIF representations. For example, does it make sense for all value sets to have separate primary source MIF representations, or should there be just one primary source MIF-defined package of values sets for all of the structural attributes in the RIM? 2 Existing HL7 Primary Source Artifacts Given the notion of primary source defined above, it makes sense to apply that notion to the current paradigm for model development. Suppose Woody is holding the Access DB Composite Repository that was compiled from all artifact information leading up to Ballot 6. At this point the Composite Repository is the primary source artifact for the entire HL7 v3 specification. It holds the definitive RIM artifacts (RIM v2.02), the definitive Vocabulary (Voc v209), the definitive Naming conventions (Naming v17), and all DMIMs, RMIMs, HMDs and MessageTypes collected from HL7 technical committees as of December 6, 2003. It also contains information merged in from the PubDB as to the structure of Ballot 6 documentation. This database was then used to publish the official HL7 v3 specification for balloting in Ballot 6. At this point one could consider the HTML ballot to be the primary source for the HL7 v3 specification or one could consider the Ballot 6 Composite Repository to be the primary source. Let’s assume that the Composite 6 Repository is the primary source for the entire HL7 v3 specification, and that the HTML pages are just a transformation of it into human readable format. Any errors found in the HTML ballot documents would have to be found and corrected in the Composite 6 Repository. In the time period after Ballot 6 and leading up to Ballot 7, the primary source Composite 6 Repository gets modified in several ways. First, the existing Composite 6 Repository is archived for posterity – it remains as the primary source specification for Ballot 6. Then a copy of it is stripped of domain specific artifacts and renamed to something like rim0202d-emptyRepository20031231.mdb. At this point we have two primary source Repositories. The archived Composite 6 Repository is the primary source for all domain specific artifacts, i.e. DMIMs, RMIMs, HMDs and MessageTypes, while the new emptyRepository v2.02 becomes the primary source for all ongoing RIM, Naming and Vocabulary modifications. Apparently there were no changes to the RIM during this period because the Ballot 7 Composite Repository (dated 21 March 2004) still references RIM v2.02. However, there were several vocabulary and naming updates, since the Ballot 7 Composite Repository now includes Vocab v211 and Naming v18. Each of these vocabulary and naming modifications was made on the single primary source emptyRepository v2.02 artifact, so there was no possibility of ambiguity. The actions are even documented in the database itself! The primary source copy was held very closely by several people and was only updated by editors one person at a time as HL7 decisions were made and approved by appropriate HL7 technical committees. At this point Woody (I’m making some assumptions here!) archived the emptyRepository v2.02 as the persistent primary source for RIM and made a copy that was published in the Tools section of the HL7 website. Then the existing DMIMs, RMIMs, HMDs and MessageTypes specific to a given Domain were copied from the primary source Composite 6 Repository into copies of the emptyRepository and distributed to every Domain committee responsible for domain artifacts under ballot. At this point, the Domain specific contents of each repository sent to a technical committee becomes the primary source artifact for the next version (i.e. Ballot 7) of all DMIMs, RMIMs, HMDs and MessageTypes in that repository. The previous versions remain fixed with their existing definitions, labeled as Ballot 6 specifications. We now have a large number of primary source artifacts, many with duplicate information that carries a high risk of ambiguous modification. The primary source for RIM, Formal Naming and Vocabulary is the emptyRepository that has been archived, but since every Domain committee has a non-primary source copy of it, with the tools to make changes to it, there is some risk that we’ll have different RIM constraints, different vocabularies, and different naming conventions 3 floating around in the various copies. Another issue is that there is no way to be certain during this period whether changes have been made to a specific DMIM, RMIM, HMD or MessageType. Thus we must assume that some changes may have been made to all such models and all must be re-balloted as if they have been modified. (NOTE: I’m sure there are some exceptions to this where it is known that no changes have been made to some models!) But in general, voters must assume that they’re going to see revised versions of every model sent back out to the technical committees. At the end of HL7 working Group and Interim meetings, technical committees send their revised repository databases back to HL7. The Publication committee, with assistance from HL7 staff, merges all the information back into a new Repository called the Ballot 7 Composite Repository. Any discrepancies that may have crept into the RIM, Vocabulary, and Naming artifacts are removed at this point. Also any static models that do not satisfy appropriate naming and/or vocabulary restrictions may get sent back to the domain committee for confirmation on what should be balloted. This revised Composite 7 Repository now becomes the primary source for all RIM v3 artifacts and is used to produce the Ballot 7 HTML specification. Finally this new Composite 7 Repository is added to the existing Composite 6 Repository in the archive and a new round of development and balloting begins. The most recent emptyRepository was published by Woody on March 31, 2004, under the title rim0203d-emptyRepository-20040331.mdb. It identifies its content as RIM v2.03, Naming v18, and Vocabulary v221 (http://www.hl7.org/library/data-model/V3Tooling/RimRepos.zip ). It has probably already been populated with copies of Domain-specific models from the source artifacts in the Composite 7 Repository and sent out to the domain technical committees for the next set of revisions. Proposed Future HL7 Primary Source Artifacts The Tooling Committee is proceeding under the assumption that HL7 desires a change from its existing primary source Artifact management to a new system based on MIF-derived artifacts. The current Model Interchange Format (MIF) is a collection of XML Schemas with embedded Schematron rules to enforce HL7 policy. The most recent MIF schemas are available to HL7 Members in a CVS repository at the NIAT research labs of UNLV (see Geoffrey Roberts for access codes). They can be reached at node cvs.hl7.nscee.edu:/home/hl7/cvs/HEAD/VSROOT/ v3Schemas/uml/schemas. I downloaded a collection of schemas on June 17, 2004, and the most recent date on any of the individual schemas was January 13, 2004. We know that more recent schema pieces are still under development, especially vocabulary and dynamic model, but my comments below are based on this June 17extraction. Every important HL7 artifact is or should be representable in one of the MIF schemas. The schemas form a hierarchy with lower level schemas being imported into higher level ones as follows: mifModelInterfaces \ \ \ mifStaticModelFlat mifStaticModelSerialized \ / \ / 4 \ / mifStaticModelBase mifDatatype \ / \ / \ / \ / mifStaticBase mifVocabulary \ / \ / \ / \ / mifBase | | mifExtendedMarkup | | pubDisplayMarkup | | mifReferencedCodes | | mifPatternTypes The Tooling Committee is proceeding under the assumption that every HL7 artifact can be represented as some XML element defined by the above schema hierarchy and thus would validate to one of these schemas. The architectural issues now facing the Tooling Committee are how to store and exchange these MIF-based artifacts during various stages of their development. At one extreme, one could think of the primary source HL7 v3 specification being a single very large XML file that contains elements representing each and every HL7 artifact. One could extract elements from that file for each artifact, process that element in some HL7 tool, and then add a new element or replace the existing element to effect a new constrained artifact or a revision of the existing artifact. Under this scenario there is only one primary source v3 specification. All extractions would be non-primary source copies. The single primary source specification would get updated at some point in time and the new XML document would become the new primary source. The frequency of such updates is an architectural issue. At another extreme, one could think of each artifact as having its own primary source MIF representation. The collection of all such MIF representations would be the overall definitive HL7 v3 specification. Under this scenario, one could remove an artifact representation from the collection and have that MIF document retain its status as the primary source MIF representation of that artifact. Each new MIF-representation added to the collection would carry an effective date. Under this scenario one could extract a sub-collection of artifacts each with a different effective date. The effective date of the sub-collection would then be the maximum of the individual effective dates. 5 In practice, there may not be that much difference between these two extremes. In the collection example, the collection itself could be represented as a MIF package consisting of each of the individual MIF elements. The MIF package could then be considered as the large XML file that represents the entire HL7 v3 specification. For example, if each HL7 artifact is represented as a single staticModel in the MIF, then the collection of staticModels would represent the entire HL7 v3 specification for static models. The only significant differences between these extremes might be how the primary source representations are identified and maintained. The management issues one faces in the proposed MIF-based representations of HL7 artifacts are similar to those that currently exist with the Composite Repository approach. When individual pieces, or large subsets, of the specification are extracted from the primary source, then one has to know if the extractions are primary source objects or derived objects. Similarly, when one extracts a primary source model, it may become a derived copy of the artifact it models, but it may also become the primary source of a new constrained model that could eventually become part of the whole HL7 v3 specification. A Web-based Artifact Repository Many of the comments I make below will be based on my experience in working with NIST colleagues to define and populate the NIST sponsored HL7 Artifact Registry. The primary intent of this registry is to allow HL7 member organizations to register conformance profiles and/or templates of HL7 standards. However, in order to register profiles or templates it is necessary to be able to reference and have easy access to the definitions of many HL7 artifacts, especially the static models, data types, and vocabulary. For this reason we have populated the Registry ( http://www.nist.gov/hl7xreg ) with a number of artifacts, most not yet formally adopted as HL7 standards. We believe that the Registry is essentially complete with static DMIM, RMIM, HMD, and MessageType models from the Composite 7 Repository and from Naming v18 and Vocabulary v221. Woody has kindly provided MIF representations for all HMDs and MessageTypes in Ballots 6 and 7, so those registered items in the Registry have an associated MIF-based repository item. The other artifacts are described, with links to graphics copied from the HL7 Ballot, but have no MIF-based repository item. One of the weaknesses of the Registry is the lack of automated version control, so I am not proposing that the existing Artifact Registry be used for the day-to-day development of HL7 models, where intense version control is necessary during the development process. Instead, I simply use this registry as a basis of comparison for further understanding the requirements and features desired in any tool or tools we choose for MIF-based artifact development. For comparison purposes, I’m characterizing the NIST Artifact Registry as a centralized repository that holds all primary source MIF artifacts. In addition it may hold non-primary source MIF artifacts that are derived from primary source artifacts either as packages or in some other well-defined manner. I’m assuming universal ebXML Registry Servcies or simpler web services access to the Registry for all submissions, queries and retrievals. I’m also assuming that metadata describing Registry content is separate from the content of the MIF documents that are registered. Even though names, descriptions, classifications and associations may be derived from MIF content as Registry metadata, one is not able to use registry services directly to update MIF content. Instead, all MIF updating will be considered as retrieval of a MIF document, followed by editing of the MIF document via a non-registry tool, followed by re-submission of a revised MIF document for that artifact. Of course, the MIF editing tool could be so completely 6 integrated with the Registry that the user sees it as a single edit in place, but we’ll be very careful to distinguish edits on metadata from edits on the MIF document. The Architectural Decisions Document As mentioned in the background section above, my comments are based on the June 19, 2004, version of the Architectural Decisions document under discussion by the Tooling Committee. Currently this document has 4 sections, with each section being a question posed by the author that needs a response from the Tooling Committee. At this point in time, only the first question has been discussed by committee members in teleconference sessions. The question is: What is the ‘primary’ exchange and storage format of HL7 v3 MIF-based artifacts? During these teleconferences we have agreed on a list of Requirements for the answer to this question, a list of Nice-to-Have features, and a list of Possibilities. I discuss each requirement and feature below. Req #1: Must be MIF-based (use the MIF-defined data structure, not necessarily XML) This is a non-controversial requirement because everyone is agreeing that all artifacts will have a MIF representation. To say that the storage structure and exchange mechanism must be “MIFbased” could mean different things to different people. In my mind it simply means that the MIFrepresentation of each artifact, and each MIF-implied package of artifacts, must be derivable from the storage or exchange mechanism in a straight-forward manner. For example, if a user knows the ArtifactId of a given artifact, then knowledge of that ID should be sufficient to enable extraction of the artifact’s corresponding MIF representations from the storage mechanism or from the exchange mechanism. Under this interpretation a storage mechanism might be a file system, a database, or a master XML document. Similarly, an exchange mechanism might be a single file, a collection of files, a compressed (i.e. zipped) collection of files, a database, etc. The ArtifactId must always be easily transformable either to a file name or to a query that allows extraction of the MIF representations. Multiple representations are sometimes possible, especially if the storage mechanism holds multiple versions of an artifact under development. Under a more rigid interpretation of this requirement, the storage and exchange mechanisms themselves must be MIF-based. Under the existing MIF schema, the only way this could be accomplished literally would be for the storage and exchange mechanisms to be representable as XML documents or transformable to XML documents that validate to one of the MIF schemas. As seen in the discussion of the two extremes above, even this more rigid interpretation of the requirement could be satisfied quite easily because a carefully constructed collection of MIF documents can always be represented as a single MIF document that is a MIF “package” of the individual items. The NIST Artifact Registry does not yet satisfy this requirement. At present one is able to extract a MIF-representation only one artifact at a time. Even now, we only have MIF representations of HMDs and of MessageTypes so it would be necessary to populate the Registry with agreed MIF representations for all of the other HL7 artifacts. Secondly, it is not yet possible to extract a collection of artifacts; instead, one would write a query that returns a collection of artifact 7 identifiers, and then would write individual queries to extract each MIF-representation separately. It is possible to extend Registry features to return a collection of MIFs as the result of a query, probably as a MIF package, but that is not yet accomplished. An important consideration under the Registry model for storage is that the Registry always holds the primary source MIF representation. I believe this would also be the case under the CVS model of storage. Any MIF document extracted from the Registry would be a non-primary source representation. However, invocation of the Registry “Replace” service would allow replacement of a given MIF-representation with a new MIF-representation. The replacement would be marked with a new effective date and would become the new primary source of the given artifact. The replaced version would no longer be available. Alternatively, invocation of the Registry “Supersedes” service would allow a new version of a MIF representation to be added to the Registry and the original version would point to the new version. Both versions would remain available to users. Superseded versions of the representation would have separate internal Registry identifiers, but all could hold the same ArtifactId as an external identifier. The Registry could add a constraint to ensure that each ArtifactId has at most one most recent version and could always return the most recent version unless earlier versions are specifically requested as part of a query. The NIST Artifact Registry enforces this superseded notion by labeling the items with a Ballot number and by maintaining a SupersededBy association from the older version to the newer version. Req #2: Must support the MIF package structure This requirement was also non-controversial, likely because the MIF schemas allow representation of collections of items as a single MIF-package. Thus a collection of artifacts can always be represented as a single MIF package, and a MIF package can always (if it’s a package of artifact representations!) be split apart into a collection of individual MIF-representations of the artifacts. Some very simple transforms would accomplish the necessary representations. Issues may arise as to when and how the transforms are invoked, but their existence is taken for granted. However, a more strict reading of this requirement may imply that the storage mechanism must be able to support the same flexible construction of packages as does the MIF. For example, a use case for a MIF package might be an identified artifact, plus all of the other artifacts that it references in its definition. An RMIM may reference a collection of CMETs and a natural package is the RMIM together with the RMIMs of all of the referenced CMETs. Another natural package is an HMD together with all MessageTypes that are derived from it. Another natural package is a static information model together with all of the parent models it is derived from. Many different types of users may have well-established requirements for different packages of artifacts. It would be difficult in any hierarchical file system storage mechanism to anticipate all of these different kinds of packages. At best one would have to choose a finite number of static package definitions that could be supported naturally by the file system. More dynamic or more flexible package specification would have to be based on queries or some other flexible way to identify the components of the package. If general purpose query support is available, then the file structure for storage or exchange is almost irrelevant. If the primary source HL7 v3 specification is considered to be a very large XML document, then an easily written W3C standard XPath or XQuery can usually be 8 constructed to return an appropriate subset of elements to represent the desired package of artifacts. Similarly, if the primary source HL7 v3 specification is considered to be a database of individual artifacts, then an ISO standard SQL query can usually be written to achieve the same effect. The Tooling Committee does not want to choose a storage mechanism solely on the basis of features in different query languages. The best storage mechanism would support both XQuery and SQL types of queries. In the best possible use case, the SQL query could be used to identify a small collection of artifacts that satisfy certain administrative metadata constraints and then the XQuery coulf be used to do further filtering on the content of each MIF. It should be straight-forward to specify a virtual SQL schema and an equivalent virtual XML structure that would support both types of queries. It is important to note that the next version of the ebXML Registry standard intends (never guaranteed!) to specify support for both SQL and XQuery retrieval. The NIST Artifact Registry has two different mechanisms for supporting packages. The first is an explicit ebXML “RIM Package” (not identical to MIF package) that maintains the elements of a package as HasMember associations. Each package is considered to be a collection of other items in the Registry, including other packages. A registered item can be a member of an arbitrary number of packages. Items can be added to or removed from packages simply by created or deleting HasMember associations. The second mechanism is SQL query. The Registry has an SQL schema that allows submission of robust SQL queries to identify collections of artifacts. If the result of a query is a collection of artifacts, then that collection can be returned to the user as a single XML document that is a MIF package consisting of that same collection of artifacts. As stated above, the NIST Registry does not yet support the combination of individual artifacts into a single MIF document to represent a collection, but this could be a future activity. Req #3: Must allow the storage of non-complete, non-valid MIF artifacts Again, this requirement is non-controversial, likely because HL7 artifact developers will insist upon it. An HL7 technical committee may develop a MIF-based artifact specification that it knows is incomplete. It may be waiting for a single issue to be resolved before the specification can be completed. A requirement of this committee is that it must have a mechanism for storage and exchange of this incompletely defined artifact while the issue is being resolved. The unresolved issue may even be referred to a ballot, so even the final ballot storage structures must allow for the storage of non-complete, non-valid MIF artifacts. This requirement is almost a requirement that the metadata describing a MIF document be separate from the MIF document itself. That way the document can be described in a manner that is structurally valid and stable even if the document itself is not. In the CVS storage model, the metadata describing a document in the repository consists essentially of just three items: a file name for the document, an automatic method for version maintenance, and a path of names for the location of the file in the storage hierarchy. So long as the file and path names are stable, this type of storage satisfies the requirement that MIF documents can be in any state of completion. In the single large XML MIF document model, the schematron structure rules can be a substitute for structure rules embedded in the XML descriptions. Thus it may be possible for a MIF to 9 validate as an XML document while still failing to comply with all of the schematron rules to be enforced for artifact specification. However, it may be very difficult to find tools that will manage MIF documents that do not validate to some minimal XML structure. If we choose a MIF document as the primary source HL7 v3 specifications, then we will also have to describe the storage structure in some way as an XML document that validates to some minimal structural definition, even if many of the elements in the structure that represent artifacts do not validate to a MIF element for that artifact. In a database storage model, the MIF may be part of the database structure or the MIF may be treated as if it were a separate file. Storing the MIF in the database as a Blob is essentially equivalent to storing it as a separate file. If the MIF itself is broken apart and stored in database structures, with appropriate database integrity constraints to enforce required artifact structure, then this model is essentially equivalent to the XML model with schematron rules. It will be impossible to store incomplete or invalid MIFs in the database unless the MIF structure rules can be turned on and off and kept separate from the other database structure rules. This may be difficult to achieve in some relational database systems. The advantage of combining artifact storage structures with the required artifact structure is that query can be extended to cover both metadata about the registered item and content of the registered item. In the large MIF XML model, an XPath or XQuery constraint or query specification can integrate metadata with artifact content. The same advantage is available in the database model where the MIF is separated and represented by database structures. However, in both cases this advantage is lost if the MIF representations of the artifacts do not validate to the underlying XML or database schema. The NIST Artifact Registry treats a registered item as something completely separate from the metadata that describes it. The MIF-representation may even be missing as we’ve seen for most of the artifacts currently registered in this Registry. Like in the CVS model, a MIF representation may be in any state of construction. The only requirement is that it have some sort of an identifier to be the collection point for metadata. In this Registry, a uuid identifier may be supplied when metadata describing the item is submitted, or by default, a uuid identifier is automatically generated by the Registration process. In most cases, even an incomplete or invalid artifact will have an HL7-specific ArtifactId that is part of the metadata. An artifact will likely retain the same ArtifactId as it passes though the various stages of the ballot or development process. If it is desirable to retain copies of the specification at each step of that process, then an ArtifactId may necessarily identify multiple registered items, each with a different uuid internal identifier and each with a user visible ballotStatus or developmentStatus property that allows a human (or a human written query) to distinguish among registered items having the same ArtifactId. Req #4: Must allow content to be stored in source control, or easily translatable to a format that can be stored in source control. When discussing HL7 artifacts, we’ve already agreed that the “source” is an XML document that validates, or is close to validating, to one of the MIF XML schemas. Thus to be stored in “source control” must mean stored as an XML document or stored in a format that is easily translatable to an XML document. I think this requirement is intended to rule out very sophisticated database or other data structure representations of an artifact that then require extensive processing to produce the MIF XML 10 representation. At first this seems like a non-controversial requirement, with the only issue being what “easily translatable” means. However, consider the HL7 vocabulary model as defined in the HL7 Common Terminology Services (CTS) standard. The model itself is defined in UML with the primitive elements being Concept, CodeSystem, CodedConcept, ConceptRelationship, ValueSet, ValueSetConstructor, and VocabularyDomain. The RoseTree tool for building RMIMs and HMDs uses a relational database model of these above vocabulary concepts and is able to produce hierarchical representations of each VocabularyDomain. RoseTree will produce an XML representation of the hierarchy if requested and it is this XML representation that is used to produce the HL7 ballot materials for structural attributes in the RIM. Thus one could conclude that the hierarchical XML representation is the primary source for VocabularyDomain artifacts. I have not yet studied the MIF-based representation of vocabulary information, but for now lets assume that it is analogous to the existing XML representations. My point here is that I think it would be a mistake to consider the XML hierarchy as the primary source for vocabulary. This is because the XML hierarchical representation, while of immense value to the end user, does not satisfy the requirement that a primary source be editable. This is because the Concept codes and definitions used in the hierarchy are not owned by the hierarchy. Instead, they are owned by the CodeSystem that defines them. A Concept name, code, and definition may appear at many different points in many different hierarchies, so it will be impossible to edit them in the source hierarchy document. Instead, it is necessary to edit these items in the place where they are defined, namely the relational representation of the UML model. One might conclude that the MIF representation of ValueSet should be the primary source for vocabulary information. Then the attributes in each class of a static model could reference the appropriate ValueSet. The primary source of the static model would then be separable from the primary source of the ValueSet, making it easier to edit the static model while only referencing a ValueSet. But how does one edit a ValueSet. The value set may contain Concepts from multiple CodeSystems and may have relationships defined by both ConceptRelationships and ValueSetConstructors., each owned by different owners. I think we cannot consider value sets as the primary source for vocabulary information because the Concepts and relationships among the Concepts will not normally be editable by a single editor in any complete XML representation of the entire hierarchy of the ValueSet. My purpose in bringing up this example is to show that it may be necessary to conclude that the primary sources for vocabulary information be reduced to the primitive UML concepts listed above. We may have a MIF-based primary source for CodeSystems, a separate MIF-based primary source for ConceptRelationships, a separate MIF-based primary source for ValueSetConstructor, etc. None of these primary sources will be very useful, by themselves, to the end user. Thus it may be necessary to consider the MIF-representations of ValueSets and VocabularyDomains to always be derived representations and never the primary source, and thus never editable! If a ValueSetContructor is a binary relationship between two ValueSets owned by different organizations, how does one decide which ValueSet owns the association and is able to update, delete, or replace it? The NIST Registry handles vocabulary information in much the same way as does the existing Composite Repository, i.e. each of the primitive UML concepts listed above is represented by some structure in the Registry. The Concepts and CodeSystems as separately registered, with the requirements that a Concept be derived from a CodeSystem. A CodeSystem will then be represented as a set of Concepts. ConceptRelationships are represented as HasSubtype 11 associations from a source Concept to a target Concept from the same CodeSystem. ValueSets are separately registered with ValueSetConstructors represented as DerivedFrom associations from a source ValueSet to a target ValueSet. A value set may also have HasMember associations with Concepts. A VocabularyDomain has a DerivedFrom association with a ValueSet and a UsesVocabulary association with a structural attribute of a RIM class. A VocabularyDomain is represented in the Registry as a ClassificationScheme, so it has the same hierarchical structure as that presented by RoseTree. Like RoseTree, the hierarchical structure of a VocabularyDomain is derived from other more primitive concepts and is never considered to be the primary source for vocabulary information. The NIST Registry does not yet produce a MIF representation for any of these vocabulary concepts, but it could do that if desired. However, the MIF representations of VocabularyDomains and ValueSets would probably never be considered as “source control” for these artifacts. Instead, the “source control” is embedded in the DerivedFrom, HasMember, HasSubtype, UsesVocabulary, etc. associations with other vocabulary artifacts. An update of a ValueSet or a VocabularyDomain may have to be considered as a collection of updates on other vocabulary primitives that would then produce a new representation of the ValueSet or VocabularyDomain. Taking ValueSet as an example, it’s possible that a primary source MIF representation of the ValueSet could be defined, but it would have to be clear that the ConceptCodes, Concept names, Concept descriptions, and other ValueSets involved in the construction of the givenValueSet are derived from those other sources and cannot be updated for “source control”. The only updatable parts might be the direct HasMember and DerivedFrom associations with externally defined Concepts and other ValueSets. The indirect relationships derived from these other artifacts would not be updatable. In some sense a “useful” and “complete” MIF representation of a ValueSet will likely not be editable as a primary source because it would be inheriting HasSubtype relationships from externally defined Concepts and HasMember and DerivedFrom relationships from other ValueSets. If a “source control” document is not editable, then what good is this requirement? When an HL7 user desires to see a VocabularyDomain or a ValueSet, it certainly makes sense to require that they be given a MIF-based representation of that artifact with fully expanded definitions of the underlying Concepts and the underlying relationships. But to call that document “source control” implies that it can be edited, and this I think is a mistake. In summary, I think this is a relatively easy requirement to satisfy if we can agree that “source control” representations of HL7 artifacts may not necessarily be “useful” or “complete”. Instead, they may only represent the parts of the artifact that are updatable. For example, a primary source MIF representation of the ActClass VocabularyDomain may consist simply of a name, a text definition, a link to an attribute of the Act class, and a link to a single ValueSet. Even the link to Act.classCode may be excluded because that link could be considered to be owned by the attribute rather than by the VocabularyDomain. It seems to me that this “source control” requirement is really a requirement on the tooling architecture rather than a requirement on the storage mechanism for HL7 artifacts. If the tooling architecture defines two MIF representations for each HL7 artifact: 1) the required content of a primary source MIF, and 2) the required content of a complete derived MIF representation, then it makes sense to require that storage mechanisms be able to accept primary source MIFs as new artifacts, or as updates to existing artifacts, and then be able to produce both primary source 12 MIFs and/or complete derived MIFs as requested by a user. A later “nice-to-have” feature may be a tool that can process edits to a complete derived MIF and produce the necessary replacements for the underlying primary source MIFs. Req #5: Must be easy for users to pass around The term “easy” is further explained as having the following capabilities: encapsulatable, not overly large (emailable < 1MB zipped for commonly exchanged content), not corruptible when being exchanged, should have ‘similar’ number of units to pass around regardless of the number of artifacts being communicated, i.e. always 1 or 2 files, not 10 sometimes, 20 sometimes, etc. This requirement seems to be making the assumption that copies of the storage mechanism will be passed around, rather than just a single artifact or collections of artifacts. For example, the storage mechanism for the current HL7 v3 specification is a single Access database file that encapsulates all of the artifact definitions. During development the database gets partitioned with various pieces worked in parallel for the various domain committees. It can get fairly large, especially when some larger CodeSystems are included in the file, but in general the pieces communicated to the domain committees meet the desired size constraints. The database may have integrity constraints to keep it self-consistent, but there’s always the possibility of corruption when being handled by many users. If only part of the database is passed around, then integrity constraints that span the parts will be ineffective and corruption can occur, or copies of the common parts can get modified inadventently by the parallel actions. In the CVS model of storage this requirement seems to be satisfied because there is only one primary source for the specification. Nodes or branches of the CVS structure can be extracted and passed around with confidence that they shouldn’t get corrupted. However, whole nodes or whole branches are unlikely to satisfy the desired support for “MIF packages” as discussed in Req #2. If a package is defined as a collection of files from different branches of the CVS model, then corruption may occur when trying to re-construct the path hierarchy. The large XML file model of storage also assumes a single persistent primary source, with subsets broken off for different users to work on. As seen above the pieces that get broken of are likely collections of artifacts, so they can always be represented as a MIF package and transported as a single MIF file. But as with the existing Access database, it is likely that these various pieces that get shipped around will have many common parts, and in the absence of protection against modification these common parts can be modified differently by the domain committees, thereby making it difficult to re-combine all of the parts in a consistent manner. The centralized repository model of storage assumes that the primary source MIF-defined artifacts are all stored at a central location, thus there is no danger of corruption as pieces get passed around. The downside is that applications must communicate with the central repository and they are not able to control the primary source MIF for anything other than completely new artifacts. Whenever a subset of the repository is extracted it is no longer the primary source for anything still in the repository. Whatever tool is used to make modifications to the extracted pieces must then be able to update the repository to effect those modifications on the primary source MIF artifacts. 13 As with CVS and XML file, the NIST Registry finesses this requirement by assuming that there is only one persistent primary source for artifact specification. It has the disadvantages of a centralized repository in that external applications can update existing artifacts only by extracting their primary source MIF representation, updating it locally, and then replacing or superseding it with the modified version. In this manner the primary source storage format is easy to pass around because it doesn’t get passed around. To be most effective this model assumes that all applications will have realtime access to the repository. Primary source MIFs would get downloaded one at a time and all references to other artifacts from within that MIF would be downloaded only as needed during realtime processing. Thus the effectiveness of this model depends critically on the ability to access the repository and retrieve artifacts as needed in real time. Our real world communication links may not yet be up to this task – but I think we’re getting close. Req #6: Must permit multiple versions to exist on the same machine. During artifact development different versions of an artifact will have the same name and the same ArtifactId. The only way to distinguish among versions is to have a separate unique identifier for each version or to have a special attribute of the artifact that distinguishes among versions. It will often be the case that a user will want to access the existing version and the potential new version of the artifact simultaneously, either to copy certain portions or to compare differences. A pure file system would fail this requirement because the only way to distinguish versions is by file name. Thus the file names would be constantly changing and one could never identify all versions of the file strictly by name; instead, one would have to invent some sort of name extension to identify versions or carry along additional metadata with each name. A storage model like CVS hides this versioning from the user. Database storage mechanisms can avoid this problem by defining unique identifiers for each version and providing metadata attributes to help distinguish among versions. It is interesting to note that the existing Access database Composite Repository model uses uuid’s to distinguish among artifacts even while relying upon the ArtifactId as a unique identifier when versioning is not an issue. At present, domain committees only work on one version of the model at a time, but that could be relaxed even in the existing process by relying more fully on the existing uuid’s. If a CVS model of storage is used, then there are client-side applications (e.g. Eclipse) that can communicate directly with the repository, hide all version identifiers from the user, yet ensure that versioning is not corrupted when files are copied back into CVS storage after modification. Strict enforcement of this requirement would likely mean hiding version management from the user. This is the strong point of CVS repositories and probably the weakest point of all of the other approaches we are considering. The large XML file storage model would probably have to add element identifiers in order to distinguish among artifacts that are identical except for a minor change of the wording in a description. All of the database solutions, XML or relational, would have to add version identifiers for each artifact or provide a special attribute for version management. In all cases this is straight-forward to do, but it is not always easy to shelter version management from the end user. The user may have to manage uuid’s that are not very human readable and would have to properly set any version attributes if they exist. This may be very difficult to do correctly when multiple users are working on the same artifact in parallel. 14 The NIST Registry is based on the ebXML Registry model, which has automatic version management as one of its goals. However, version 2.1 of the ebXML specification that NIST is using does not enforce automatic version management. Instead, the ebXML model provides a special attribute for userVersion and NIST is using this metadata attribute to help distinguish among versions, e.g. a different value for each ballot cycle. NIST has also added Slots to the model (Slot is the ebXML term for user-defined attributes) to help distinguish the ballotStatus of HL7 artifacts. New metadata attributes could be added to help manage versions during artifact development. As with all of the other database storage solutions, management of these version attributes is the responsibility of the artifact submitter or modifier and thereby subject to human error and confusion. Nice-to-Have Features The Architecture Decisions document lists a number of nice-to-have features for the storage and exchange format of HL7 v3 MIF-based artifacts. Each feature carries a tag of 1, 2, or 3 to indicate its relative priority. We re-state each feature and discuss it below. Feature #1 – Priority Tag 1 Should be operating-system independent. I.e. It should be possible for applications running on any operating system used by HL7 members to create and read the format. Expected operating systems include Windows, Macintosh and Unix. The existing Composite Repository is an Access database file, portions of which are passed around among users. As such it runs only on Windows operating systems. However, even though proprietary, the file format is well-known and some other tools are able to open and manipulate the contents. Individual tables can be extracted in a variety of formats and reloaded into any relational database. Full blown query and update capabilities are probably only available on Windows operating systems. A CVS repository resides in only one place with access through Internet protocols. As such access is operating system independent. The existing HL7 CVS repository at UNLV is running on a Unix server. However, client side tools associated with CVS (e.g. Eclipse) can execute on multiple operating systems thereby removing the operating system of the server as an issue. File systems can usually be compressed and exchanged as zipped files. Zipped formats are somewhat interchangeable so it is reasonable to assume that file systems of artifacts could be exchanged, and that most applications would be able to create and read the format. The NIST Registry storage format is a relational database with access through HTML browsers or ebXML Registry Services. The query language is ISO standard SQL embedded in an ebXML envelope. As such access is operating system independent. Currently the implementation runs only on the Linux operating system, but that is not important because all access is via standardized Internet protocols. Client-side tools that access the database would have to be able to send and receive Internet protocols and parse the XML-based ebXML Registry Servcies. XML databases may operate only on specific operating systems and access may be application dependent. Although W3C standards, the XPath and XQuery access languages are read-only, so updates would still be application specific. In addition, protocols for external access across the internet would still need to be established. 15 Feature #2 – Priority Tag 1 Should not allow corruption of the data stored by simple tasks such as copy, paste or move. E.g. If information is associated with the position of a file in a hierarchy, then moving the file could change the information, thereby corrupting it. Corruption of data can be ameliorated by enforcement of integrity constraints. So long as the entire HL7 v3 MIF-based specification is kept together it is possible to enforce all integrity constraints. However, as soon as portions of the specification are split off and distributed to various committees or individuals for processing, constraints across pieces cannot be enforced so corruption can creep in. Any storage or exchange format that allows users to pass around primary source MIF-based artifacts is subject to data corruption. Corruption can be reduced if primary source MIFs are stored in a protected place and if modifications are checked for consistency as they come in. The NIST Registry is centralized so it is possible to enforce constraints among all the pieces and thereby reduce possibilities for data corruption. When a MIF-based artifact definition is extracted from the Registry it is no longer considered to be a primary source. If it is later re-submitted to the Registry, either as a new artifact or as a replacement for an existing artifact, integrity constraints can be re-checked and re-verified as appropriate. Feature #3 – Priority Tag 1 Should be natively amenable to source control (text is better than binary) We’ve already discussed some of the implications of “source control” in Req #4 above. If “source” is always considered to be a primary source XML MIF-based document, then “source control” via simple text editors is possible. However, if the primary source is an XML or relational database, then source control may be dependent upon the update and manipulation facilities of the database management system. Reconsider the Vocabulary examples discussed in Req #4 above. User friendly and complete MIF-based representations of vocabulary artifacts may not be updatable, so the primary source MIF-based artifact definitions may involve a large number of references to other primary source MIF-based documents. Although each of these documents is subject to “source control” via a simple text editor, keeping track of all of the primary source MIF documents may require features of a database management system. The best use of the NIST Registry assumes that all primary source MIF-based artifact definitions will be kept in the Registry and that updates of the HL7 v3 specification will be through tools that access the primary source MIF-based definitions in realtime interactions with the database using ebXML standard Registry Services. Feature #4 – Priority Tag 1 The format should be usable and exchangeable by all HL7 members. This means that neither the format itself nor any libraries or software to the use of the format or the requirement to use software or libraries to interpret that format which includes such fees or barriers. 16 The storage format for MIF-based artifacts is important only if that storage format is passed around as primary source pieces of the HL7 v3 specification. This is not an issue in centralized storage formats. In centralized storage formats, the main issues are the access and manipulation protocols used to submit and modify the MIF-based artifacts. The existing format is an Access database file, subsets of which are passed around. Thus the storage format is an issue and it should be usable and exchangeable by all HL7 members. This does not rule out the use of proprietary storage mechanisms provided that the MIF-based artifacts can be extracted from that mechanism using tools that are obtainable and usable by all HL7 members. Feature #5 – Priority Tag 2 Should permit multiple versions to be open within tool instances simultaneously. Examples include the need to display an old version while working on changes to a new version; performing maintenance on a published standard, while also working on developing a future release. This feature is very similar to Req #6 above regarding the existence of multiple versions on the same machine. It differs only in that it requires that multiple versions be accessable within tool instances simultaneously. As discussed above in Req #6, it is fairly straight-forward to adopt identifiers or metadata attributes to distinguish among versions even if the artifact name and ArtifactId remain fixed. The real problem is providing software support to occlude the complexities of version management from the end user. Feature #6 – Priority Tag ?? Should allow splitting, distribution and recombination of component artifacts. Artifacts are any element that has traditionally been assigned a publication identifier. Essentially this requirement means that the format should allow distribution of ‘subsets’ of artifacts, and allow support tooling to move artifacts from one storage location to another for easy exchange between HL7 members. This feature is closely related to Req #2 above saying that exchange and storage formats must support the “MIF package” structure. This feature extends that notion to features that allow the storage mechanism itself to provide flexible splitting, distribution and recombination of packages. We concluded above that this kind of flexibility probably requires support for a general purpose query and update language over the storage structures themselves. Since the only standardized query languages are XQuery, XPath, SQL, and OCL, full support for this feature probably requires full implementation of at least one of those languages. And since three of these languages are read-only, the storage mechanism may still require that applications use a proprietary mechanism for submission of new artifacts and maintenance of existing ones. The NIST Registry allows the use of a steamlined version of SQL within an ebXML Request to return collections of artifacts that satisfy the request. However, it requires use of other standardized ebXML Registry Services for submitting new items to be registered, for updating any metadata that describes a registered item, and for maintaining classification of artifacts and associations among artifacts. 17 Returning to our notion of a primary source MIF representation of an artifact versus a complete derived MIF representation, this feature requires support for the ability to break apart a complete derived MIF into a self-contained and consistent collection of primary source MIF primitives. It’s not clear if this should be a property of the storage and exchange format mechanisms or a desired feature of other HL7 tools, e.g. a later version of RoseTree, that access the MIF-based artifacts. Feature #7 – Priority Tag 2 Should be amenable to search & replace and manual editing. (Primarily needed for power users) This feature applies more to the storage mechanism than it does to the exchange mechanism. It is independent of whether the storage mechanism is centralized or local to the user’s system. It requires that it be straight-forward to find the artifacts of interest, extract them, modify them with some tool, and then re-submit them to the storage mechanism. The search mechanism could be a browsing mechanism that makes it easy to navigate through the storage structure to find the artifacts of interest or artifacts that are linked to the artifacts of interest. The search mechanism could also involve support for query, but that is addressed by the next feature below. The desired manual editing feature is similar to Feature #3 above, but discussion of that feature reveals that editing of a MIF representation requires that the representation be updatable. We’ve seen that updatable MIFs may not always be complete. However, if our MIF architecture requires that every artifact have a primary source MIF representation, then that artifact will always be manually editable by text edits on the XML MIF representation. The downside is that many complete MIF representations, like the full hierarchy implied by the recursive definition of a ValueSet, will not be directly editable, thus requiring that modification of a ValueSet may result in the modification of a larger number of vocabulary primitives. The NIST Registry seems to satisfy this requirement even in its present form. It allows identification of a collection of artifacts by SQL query, and then allows browsing between artifacts that are linked to one another by different types of associations. It assumes that a primary source MIF representation is directly available in the Registry and that such artifacts can be copied from the Registry, modified by an external tool, or manually edited by a simple text editor, and then submitted back to the registry either as a replacement for the copied artifact or as a new artifact. It also assumes that compound MIF representation could be extracted from the Registry, worked on locally for maintenance, then split apart into primary source MIF representations and returned to the Registry either as replacements or new versions. Feature #8 – Priority Tag 3 Should allow query and search capabilities intrinsic to the file format. (Note: query and search capability is critical, it’s just a question of how amenable the raw format is) This feature implies support for robust query in a language that is available to and understood by a large number of tool developers. It probably requires that the query language be standardized and supported by multiple vendors or multiple open source implementations. In my mind this limits query support to languages such as XQuery, Xpath, SQL and OCL. 18 Each of these languages is amenable to the raw format of the storage structure; in fact, they essentially require that the raw format be viewable, at least virtually, as satisfying a static schema definition. XQuery requires existence of an implied XML schema, SQL requires existence of an implied SQL schema, and OCL requires existence of an implied object model. The ebXML Registry standard allows embedding of a query language within a Registry Request. The open source implementation that NIST uses only supports embedding of a reduced (but still robust) subset of SQL. If SQL is supported, then the ebXML Registry standard requires the existence of an implied SQL schema consisting of about 20 defined tables. Since SQL is used only for query, not for update, it doesn’t matter if these tables are updatable base tables or nonupdatable view representations, thereby making it much easier for various kinds of database products to claim conformance. The Registry Services for submission or update are XML elements that correspond to new insertions into these virtual tables; the difference is that an entire Registry object is considered as a whole rather than as individual insertions into SQL tables. For the current version of the standard, update is viewed as the replacement of a registry object (i.e. rows in one or more tables) by a new registry object. Keep in mind that we are talking only about metadata structures here; the item being registered, i.e. the MIF representation of an artifact, is self contained and stored as a whole. Conclusions This paper was written to help the author understand the real meaning behind some of the stated requirements and nice-to-have features of the ‘primary’ exchange and storage format of HL7 v3 MIF-based artifacts. It helps the author to understand the requirements by comparing them against the NIST Registry, which does claim to hold relevant HL7 artifacts. The primary purpose of the NIST Registry is to support accessibility to HL7 conformance profiles and thus it must be able to reference all ‘final HL7 standard’ artifacts. The NIST Registry was not designed to support artifact development and intensive, pre-final version management, but it does make sense to compare its capabilities against the stated requirements and features to see how it measures up. The first conclusion is that the Tooling Committee needs to make a distinction in its discussions between primary source and non-primary source MIF-based representations. Primary source MIFs will be updatable, but they may not be complete and exhaustive representations of an HL7 artifact, because the artifact may be recursively defined from many other primary source pieces. I suggest that we define the contents of two types of MIF-based artifact representations 1) a primary source MIF representation, which will only contain the parts of an artifact that are updatable, and 2) a complete derived MIF representation, which will import the relevant parts from other primary source MIF representation to present a complete and user friendly description of the entire artifact structure, including those imported parts that cannot be updated. A primary source MIF will only reference other artifacts, while a complete derived MIF may import relevant parts of other artifacts that it uses or is dependent upon. For example, a complete derived MIF of an HMD may import relevant parts of the RMIM or other information models it depends upon, the CMETS it uses, and the datatype descriptions and value sets that are necessary to construct and understand a valid MessageType. A second conclusion is that the Tooling Committee should make an up-front decision about the pros and cons of a centralized artifact storage structure versus a storage structure that will be 19 passed among multiple users and tool developers. Many of the requirements and features seem to assume that the artifact storage structure itself will be passed around rather than just MIF representations of the artifacts or collections of artifacts. Once we know if the primary source storage structures are centralized or passed around, discussion of the requirements and features becomes much more focused. Depending on the answer to these alternatives, tool developers can concentrate on MIF management versus MIF storage management. A third conclusion is that the answer to the second conclusion may depend upon the complexity of the storage model. If the ‘primary’ storage model is simply a collection of MIF-based artifact definitions, then it is easy to think of MIF packages being passed around as a single zipped file that when opened gives the whole collection of MIF artifact definitions. However, if the storage structures assume the complexities of generalized query and other features of a database management system, then it is much more difficult to think of passing around such systems and a centralized storage repository makes much more sense. Note that a virtual centralized storage mechanism doesn’t prohibit distribution; the centralized repository could be a distributed database with replications in as many locations as are necessary to support efficient access. But management of the replication would become a repository problem and not an HL7 user or tool developer problem. A fourth conclusion is that the Tooling Committee has not yet spent enough time talking about ‘ownership’ of MIF-based artifact definitions. It makes sense to assume that every HL7 artifact is ‘owned’ by some HL7 technical committee and that only a designated representative of that committee, i.e. en editor, can make updates to it. If this is the case, then the artifact storage structure may necessarily become a bit more complicated in order to group those items that are owned by the same technical committee. We may need a rule to say that an updatable package of MIF artifacts may only contain primary source MIFs that are owned by the same user, thereby making that user responsible for the entire contents while holding possession. The CVS storage model makes this assumption as each node of the storage structure can be owned by a different user and only the owner of a node may make modifications to the artifacts in that node. In all of the database solutions discussed above, there is an assumption of different users or roles, each with potentially different access and update privileges, so no changes to the models are necessary to accommodate ‘ownership’ control of the artifact or metadata describing the artifact. Possibilities The first section of the Tooling Committee’s Architectural Decisions document lists a number of possibilities for the ‘primary’ storage and exchange format of HL7 v3 MIF-based artifacts, including: Directory structure of MIF files, Zip-file containing directory structure of MIF files, One big XML file, Relational database, XML database, Other ??? The Tooling Committee has not yet discussed these possibilities, and may end up changing them significantly. However, I think we can eliminate a lot of debate if we agree to focus on the word ‘primary’ and regard multiple possibilities as derivations of ‘primary’. 20 Suppose we think of the entire HL7 v3 specification as being a collection of MIF-based artifacts. To avoid confusion, let’s also assume that each of these artifacts is a primary source MIF, or easily decomposable into a collection of primary source MIFS, all owned by the same technical committee. Thus each MIF is updateable or replaceable by its owner. Suppose the MIFs are groupable by owner into a directory structure of two levels, so that the first level is the entire specification and each node at the next level is owned by a technical committee and consists only of MIFs that can be updated by an editor of that committee. If necessary, we could allow additional levels for sub-committees, or groupings into different kinds of models, but that seems to be an fairly straight-forward extension. I’m suggesting that we adopt the first bullet above, i.e. directory structure of MIF files, as the ‘primary’ storage format for MIF representations of HL7 artifacts. This would have to be a static directory structure, defined by the Tooling Committee, and well-understood by all users and all tool developers. Users or tools would be able to ask for copies (i.e. non-primary sources) of all MIFs stored at any node in the directory structure. Only the owners of that node would be able to re-submit new versions of MIFs under that node. Adoption of such a ‘primary’ storage format does not preclude other storage formats so long as they can virtually support the ‘primary’ format. Suppose we also adopt the second bullet above, i.e. zip-file containing directory structure of MIF files, as a required feature of any HL7-condoned primary source MIF storage format. A user should be able to request a zip file of any node in the directory structure and receive a single file containing all of the necessary information to reconstruct the directory structure locally. The Tooling Committee may have to choose one or more zip formats that satisfy this requirement. I don’t know if gnu zip, windows zip, tar files, etc. are mutually translatable. Owners of a node should be able to resubmit a package of MIFs in the same zipped format and have them expanded and properly acted upon by the storage mechanism. The third bullet above, i.e. one big XML file, could also be adopted as a required feature of any HL7-condoned primary source MIF storage format. In this case it would be necessary for the Tooling Committee to define a new XML format that would validate to the MIF schemas, but that would also contain sufficient information to allow re-construction of the directory structure specified as ‘primary’. I think the flexibility of the MIF schemas would allow this to be done with each node of the ‘primary’ directory structure being a specific package of MIF artifacts. As with the zip requirement, a user should be able to request this specialized MIF package for any node in the ‘primary’ directory structure and the owner of that node should be able to re-submit new versions using the same MIF package structure. The other bullets above, e.g. relational database, XML database, etc. could be format options offered by any HL7-condoned primary source MIF storage mechanism. They would not be required or precluded. It’s possible that different repositories may cooperate with one another to hold identical collections of MIF-based artifacts (using repository replication services) while offering different database or other format options to different sets of users. The Other 3 Sections of the Architecture Decisions document I have still not thought about the requirements and features discussed in these three sections. However, if others find my deliberations useful, I could expand this paper to cover those topics. 21