Download Slide 1

Adapting an Existing Data Service to be caBIG™ Silver-level Compliant Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact: [email protected] The National Cancer Institute’s caBIG™ initiative aims for interoperability of bioinformatics applications. caBIG™ envisions that this will be achieved by encouraging all applications to implement a standard programming interface and to register their terms and data objects with a centralized service. The required programming interface is essentially defined in terms of the behavior of applications built using the caCORE Software Development Kit (SDK). The caCORE SDK is designed and documented for building a new application from scratch. Little is documented on how one might achieve caBIG™ silver-level compliance in an application not built with the caCORE SDK. This poster describes the caCORE SDK development and build process and how the LabKey team changed it to work with their existing proteomics platform software. The LabKey/CPAS solution creates a parallel web application that supports the caBIG™ programming interface and accesses LabKey/CPAS data through a SQL View layer. Introduction In 2007, the LabKey/CPAS development team set out to achieve caBIG™ “silver level” compliance for the MS2 proteomics data managed by CPAS, our application used by several large cancer center clients. Achieving compliance proved difficult because caBIG™ compliance for a data service is defined in terms of the behavior of applications built with the caCORE SDK. LabKey/CPAS was not designed or built with any reference to the caCORE SDK. The caBIG™ compliance guidelines suggest that building an application with the caCORE SDK was just one possible implementation of silver compliance. We found, however, no precise definition or test for what caBIG silver compliance meant, in particular what queries a silver-compliant service needed to support. Our challenge became finding a way to incorporate the caCORE runtime architecture into our existing application with minimal impact on existing code. The caCORE Application Paradigm The caCORE SDK is based on a software development paradigm that starts with an abstract model of the entities represented in a particular application. Real-world examples of such entities include identified peptides in an MS2 run or microarray test results. Entities are usually related to other entities in known ways. For example a single MS2 “run” entity must have 1 or more “FASTA databases” and may have 0 or one or more “identified peptides”. Generally the interesting entities in an application are those stored in the database. There is often a close correspondence between a row (record) in a SQL table in the database used by an application and an instance (single entity) of a class of similar entities to be exposed by the application. The caCORE SDK architecture is based largely on the 1-to-1 correspondence between an application class and a SQL table. Figure 1 The caCORE development paradigm starts with the creation of Class and Data models in UML. Objects in these models have relationships and attributes that must be specified exactly for a successful build. At right is a combined Class and Data model diagram showing a small subset of the models developed to achieve caBIG silver compliance for LabKey/CPAS. One of the design goals of the caCORE architecture is to create an inter-operability standard that is not tied to a single programming language. So in the caCORE development paradigm, the developer describes objects and their relationships in Universal Modeling Language (UML). UML is a high-level, primarily graphical approach to defining a programming project. UML is implemented by a number of tools including Enterprise Architect and ArgoUML, the two tools supported by the current caCORE SDK (version 4.0). UML modeling, however, is only partly standardized. It is difficult, for example, to transfer a model between tools without losing information in the transfer. caCORE SDK Development Process Challenges in Adapting an Existing Application to caCORE There are three phases in the caCORE development paradigm: 1. Create the UML model elements using the UML modeling tool. This is a painstaking task for any moderately complex real-world application. The application object model is essentially specified twice: as a UML Class model and as a UML Data model. The Class model corresponds to the objects in the application that a developer will ultimately use to access the data service. The Data model describes the implementation of those classes in a relational database, In most cases there is a single SQL table that corresponds to a single Class object. The data objects are linked together through a set of specific relationships and attribute values that must all match exactly, but are each specified and visible on separate property dialogs within Enterprise Architect. (Note: the 4.0 SDK has added a very useful validation step to the build process that should make it much easier to track down and fix inconsistencies and omissions in the UML models than what the LabKey/CPAS team experienced.) Figure 2 shows a small subset of the LabKey/CPAS UML model in a diagram that combines some the class elements and the data elements in a single diagram. 2. Register the classes and attributes of the UML model objects with NCI’s Enterprise Vocabulary Services (EVS) and the Cancer Data Standards Repository (caDSR). The common data element identifiers resulting from this step are incorporated into the class model objects as additional tagged values. 3. Run the SDK build process, creating three runtime entities from the model (figure 2) Most large-scale, team-built applications are not designed using an application generator approach. LabKey/CPAS is one such application. Yet LabKey/CPAS still needs to participate in the interoperability of caBIG. For these situations, the caCORE SDK can be used to generate a web application that runs in parallel to an existing application and exposes a caBIG™ silver-compliant programming interface over the data managed by the non-caCORE application. The main pre-requisite to this architecture is that the data to expose is held in a relational database. We also made the big simplification that the caCOREgenerated web application would expose read-only interfaces, which is allowed and appropriate for caBIG™ compliance. Within this simplified target, we still encountered difficulties around the following: • SQL schema implementation differences from caCORE. The caCORE SDK makes several assumptions regarding the database schema that may not be true for an existing application: • A class in the object model to be exposed corresponds 1-to-1 with a table in the SQL Schema • The object identifier maps to a single integer primary key in the corresponding relational table. • A relationship between Class objects corresponds to a foreign key in the SQL tables • Security integration. An existing application will likely have some security implementation that logically should extend to the caBIG™ interface. The caCORE SDK, however, discusses only the implementation of security in a new application, not integration with an existing security model. API libraries UML Class Model Web Application SDK Build Process UML Data Model SQL Schema (Tables) •Database definition scripts, in the form of SQL CREATE TABLE commands •A web application that implements the UML Class model and can translate requests for objects into SQL commands. •A set of programming interface libraries that enable applications to query, insert, update and delete application objects over several different communication channels, including local Java applications and web service calls. Figure 2. The caCORE SDK Build process caCORE Runtime Architecture In a software application based on the caCORE design, developers write web pages and program-to-program applications using the API generated by the SDK build process. The web application handles both read and write access to the underlying SQL database in order to support the creation and management of application objects. At the core of the generated caCORE web application is Hibernate, an open source middleware layer for mapping Java programming objects into SQL table objects and viceversa. (Figure 3). The caCORE SDK build process translates the UML model into configuration files that allow Hibernate to construct complex queries by translating relationships between objects into SQL JOIN constructs. Hibernate allows programmers to issue database queries in a simple a “Query By Example” format. The use of Hibernate in the caCORE runtime yields several benefits: • It avoids mixing SQL commands application code, common source of bugs in web database applications. • It is highly configurable, allowing the developer to tune the way Hibernate translates object access into SQL. • It supports a standardized “Hibernate Query Language” (HQL) that looks like SQL but works unchanged across all supported relational databases, allowing the developer to issue more complex queries than can be expressed via the standard QBE mechanism. The caCORE SDK allows a developer or analyst to leverage application model knowledge into a working web database application that would otherwise be very difficult and expensive to build from scratch. Local JSP Remote WS The LabKey/CPAS Solution LabKey/CPAS resolved these challenges through the creation of a SQL View layer. In our solution, the Data model defines a virtual schema definition in a database schema named “cabig”. We then created a set of SQL views with the same names and same columns as the UML Data model. The caCOREgenerated web application interacts with these views as if they were tables. The web application cannot tell the difference. Under the covers, the view layer passes through the queries to the original base tables (managed by the non-caCORE application), and fixes up the differences along the way. We wrapped the cabig view definition scripts into a new module of LabKey/CPAS and included a small set of UI changes that configures and tests caBIG™ access for a given folder. The view layer solves the issues described above: •Security Integration: Since data access in LabKey/CPAS is Search granted on a folder-by-folder basis, we wanted to enable or Script Application disable caBIG access by folder. We added a single true/false apps “caBIGPublished” column to our existing core.Containers table. DataThis bit is turned on and off by the “Publish” button accessible on Driven pages a project’s Permissions page. The corresponding Containers Client caCORE view in the cabig schema includes the restriction “WHERE API API caBIGPublished=true”. All of the other view definitions in the cabig schema include an inner join to the cabig.Containers view. As a result, the caBIG interface sees only data in those containers that have been published. caCORE •Data Model Compliance: Most of the underlying CPAS tables LabKey/CPAS web have a single integer primary key, but a few had two-column application integer keys. To meet the caCORE’s requirement for a single column key, the SQL View definition includes a sum function: SELECT ((4294967296 * op.propertyid)+op.objectid) AS id, .. Local Java lib Remote WS lib caCORE web application Hibernate Domain model SQL database Figure 3. The caCORE runtime architecture As a second example, the PeptidesData table in CPAS is used to store score values from different search engines in genericallynamed “ScoreX” columns.. For caBIG, we chose to represent the scores for different engines as different objects (preserving the 1to-1 paradigm). We handled this difference in the view layer by creating a view per search engine, with the appropriate filter. cabig Views Abstract SQL database Figure 4. The caCORE implementation for CPAS Conclusion Our efforts to adapt our existing comprehensive proteomics application to achieve caBIG™ silver compliance proved successful once we decided on the basic approach of running the caCORE SDK generated web application in parallel to LabKey/CPAS. In our design, the SDK generated application accesses the relational data through a set of views that handle some of the tricky mapping and security problems. The views also act as a buffer between the underlying base tables and the web application, allowing names to change in one place without affecting the other. In the future, it will be relatively easy for LabKey to expand of the scope of our caBIG™ interface to incorporate any data managed by LabKey Server. In fact, putting the data into LabKey may well be the fastest way for a developer to achieve silver compliance for a data service, while at the same time gaining many of the data analysis and management features that are built-in to the LabKey platform.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide 1