Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information privacy law wikipedia , lookup

Microsoft Access wikipedia , lookup

Asynchronous I/O wikipedia , lookup

Semantic Web wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Data model wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Operational transformation wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Versant Object Database wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Adapting an Existing Data Service to be caBIG™ Silver-level Compliant
Peter Hussey LabKey Software, Inc, Seattle, WA USA
Contact: [email protected]
The National Cancer Institute’s caBIG™ initiative aims for interoperability of bioinformatics
applications. caBIG™ envisions that this will be achieved by encouraging all applications to
implement a standard programming interface and to register their terms and data objects with a
centralized service. The required programming interface is essentially defined in terms of the
behavior of applications built using the caCORE Software Development Kit (SDK). The caCORE
SDK is designed and documented for building a new application from scratch. Little is
documented on how one might achieve caBIG™ silver-level compliance in an application not
built with the caCORE SDK. This poster describes the caCORE SDK development and build
process and how the LabKey team changed it to work with their existing proteomics platform
software. The LabKey/CPAS solution creates a parallel web application that supports the
caBIG™ programming interface and accesses LabKey/CPAS data through a SQL View layer.
Introduction
In 2007, the LabKey/CPAS development team set out to achieve caBIG™ “silver level”
compliance for the MS2 proteomics data managed by CPAS, our application used by several large
cancer center clients. Achieving compliance proved difficult because caBIG™ compliance for a
data service is defined in terms of the behavior of applications built with the caCORE SDK.
LabKey/CPAS was not designed or built with any reference to the caCORE SDK. The caBIG™
compliance guidelines suggest that building an application with the caCORE SDK was just one
possible implementation of silver compliance. We found, however, no precise definition or test for
what caBIG silver compliance meant, in particular what queries a silver-compliant service needed
to support. Our challenge became finding a way to incorporate the caCORE runtime architecture
into our existing application with minimal impact on existing code.
The caCORE Application Paradigm
The caCORE SDK is based on a software development paradigm that starts with an abstract
model of the entities represented in a particular application. Real-world examples of such entities
include identified peptides in an MS2 run or microarray test results. Entities are usually related to
other entities in known ways. For example a single MS2 “run” entity must have 1 or more “FASTA
databases” and may have 0 or one or more “identified peptides”. Generally the interesting entities
in an application are those stored in the database. There is often a close correspondence
between a row (record) in a SQL table in the database used by an application and an instance
(single entity) of a class of similar entities to be exposed by the application. The caCORE SDK
architecture is based largely on the 1-to-1 correspondence between an application class and a
SQL table.
Figure 1
The caCORE
development
paradigm starts with
the creation of Class
and Data models in
UML. Objects in these
models have
relationships and
attributes that must be
specified exactly for a
successful build. At
right is a combined
Class and Data model
diagram showing a
small subset of the
models developed to
achieve caBIG silver
compliance for
LabKey/CPAS.
One of the design goals of the caCORE architecture is to create an inter-operability standard that
is not tied to a single programming language. So in the caCORE development paradigm, the
developer describes objects and their relationships in Universal Modeling Language (UML). UML
is a high-level, primarily graphical approach to defining a programming project. UML is
implemented by a number of tools including Enterprise Architect and ArgoUML, the two tools
supported by the current caCORE SDK (version 4.0). UML modeling, however, is only partly
standardized. It is difficult, for example, to transfer a model between tools without losing
information in the transfer.
caCORE SDK Development Process
Challenges in Adapting an Existing Application to caCORE
There are three phases in the caCORE development paradigm:
1. Create the UML model elements using the UML modeling tool. This is a painstaking task for any
moderately complex real-world application. The application object model is essentially specified
twice: as a UML Class model and as a UML Data model. The Class model corresponds to the
objects in the application that a developer will ultimately use to access the data service. The Data
model describes the implementation of those classes in a relational database, In most cases
there is a single SQL table that corresponds to a single Class object. The data objects are linked
together through a set of specific relationships and attribute values that must all match exactly,
but are each specified and visible on separate property dialogs within Enterprise Architect. (Note:
the 4.0 SDK has added a very useful validation step to the build process that should make it
much easier to track down and fix inconsistencies and omissions in the UML models than what
the LabKey/CPAS team experienced.) Figure 2 shows a small subset of the LabKey/CPAS UML
model in a diagram that combines some the class elements and the data elements in a single
diagram.
2. Register the classes and attributes of the UML model objects with NCI’s Enterprise Vocabulary
Services (EVS) and the Cancer Data Standards Repository (caDSR). The common data element
identifiers resulting from this step are incorporated into the class model objects as additional
tagged values.
3. Run the SDK build process, creating three runtime entities from the model (figure 2)
Most large-scale, team-built applications are not designed using an application generator approach.
LabKey/CPAS is one such application. Yet LabKey/CPAS still needs to participate in the interoperability
of caBIG. For these situations, the caCORE SDK can be used to generate a web application that runs in
parallel to an existing application and exposes a caBIG™ silver-compliant programming interface over
the data managed by the non-caCORE application. The main pre-requisite to this architecture is that the
data to expose is held in a relational database. We also made the big simplification that the caCOREgenerated web application would expose read-only interfaces, which is allowed and appropriate for
caBIG™ compliance. Within this simplified target, we still encountered difficulties around the following:
• SQL schema implementation differences from caCORE. The caCORE SDK makes several
assumptions regarding the database schema that may not be true for an existing application:
• A class in the object model to be exposed corresponds 1-to-1 with a table in the SQL Schema
• The object identifier maps to a single integer primary key in the corresponding relational table.
• A relationship between Class objects corresponds to a foreign key in the SQL tables
• Security integration. An existing application will likely have some security implementation that
logically should extend to the caBIG™ interface. The caCORE SDK, however, discusses only the
implementation of security in a new application, not integration with an existing security model.
API libraries
UML Class Model
Web Application
SDK Build Process
UML Data Model
SQL Schema (Tables)
•Database definition scripts, in the
form of SQL CREATE TABLE
commands
•A web application that implements the
UML Class model and can translate
requests for objects into SQL
commands.
•A set of programming interface
libraries that enable applications to
query, insert, update and delete
application objects over several
different communication channels,
including local Java applications and
web service calls.
Figure 2. The caCORE SDK Build process
caCORE Runtime Architecture
In a software application based on the caCORE design, developers write web pages and
program-to-program applications using the API generated by the SDK build process. The web
application handles both read and write access to the underlying SQL database in order to
support the creation and management of application objects.
At the core of the generated caCORE web application is
Hibernate, an open source middleware layer for mapping
Java programming objects into SQL table objects and viceversa. (Figure 3). The caCORE SDK build process
translates the UML model into configuration files that allow
Hibernate to construct complex queries by translating
relationships between objects into SQL JOIN constructs.
Hibernate allows programmers to issue database queries in
a simple a “Query By Example” format. The use of
Hibernate in the caCORE runtime yields several benefits:
• It avoids mixing SQL commands application code,
common source of bugs in web database applications.
• It is highly configurable, allowing the developer to tune
the way Hibernate translates object access into SQL.
• It supports a standardized “Hibernate Query Language”
(HQL) that looks like SQL but works unchanged across
all supported relational databases, allowing the
developer to issue more complex queries than can be
expressed via the standard QBE mechanism.
The caCORE SDK allows a developer or analyst to leverage
application model knowledge into a working web database
application that would otherwise be very difficult and
expensive to build from scratch.
Local JSP
Remote
WS
The LabKey/CPAS Solution
LabKey/CPAS resolved these challenges through the creation of a SQL View layer. In our solution, the
Data model defines a virtual schema definition in a database schema named “cabig”. We then created
a set of SQL views with the same names and same columns as the UML Data model. The caCOREgenerated web application interacts with these views as if they were tables. The web application cannot
tell the difference. Under the covers, the view layer passes through the queries to the original base
tables (managed by the non-caCORE application), and fixes up the differences along the way. We
wrapped the cabig view definition scripts into a new module of LabKey/CPAS and included a small set
of UI changes that configures and tests caBIG™ access for a given folder.
The view layer solves the issues described above:
•Security Integration: Since data access in LabKey/CPAS is
Search
granted on a folder-by-folder basis, we wanted to enable or
Script
Application
disable caBIG access by folder. We added a single true/false
apps
“caBIGPublished” column to our existing core.Containers table.
DataThis bit is turned on and off by the “Publish” button accessible on
Driven
pages
a project’s Permissions page. The corresponding Containers
Client
caCORE
view in the cabig schema includes the restriction “WHERE
API
API
caBIGPublished=true”. All of the other view definitions in the
cabig schema include an inner join to the cabig.Containers view.
As a result, the caBIG interface sees only data in those
containers that have been published.
caCORE
•Data Model Compliance: Most of the underlying CPAS tables
LabKey/CPAS
web
have a single integer primary key, but a few had two-column
application
integer keys. To meet the caCORE’s requirement for a single
column key, the SQL View definition includes a sum function:
SELECT ((4294967296 * op.propertyid)+op.objectid) AS id, ..
Local Java
lib
Remote
WS lib
caCORE
web application
Hibernate
Domain
model
SQL
database
Figure 3. The caCORE runtime architecture
As a second example, the PeptidesData table in CPAS is used to
store score values from different search engines in genericallynamed “ScoreX” columns.. For caBIG, we chose to represent the
scores for different engines as different objects (preserving the 1to-1 paradigm). We handled this difference in the view layer by
creating a view per search engine, with the appropriate filter.
cabig
Views
Abstract
SQL
database
Figure 4. The caCORE implementation for CPAS
Conclusion
Our efforts to adapt our existing comprehensive proteomics application to achieve caBIG™ silver
compliance proved successful once we decided on the basic approach of running the caCORE SDK
generated web application in parallel to LabKey/CPAS. In our design, the SDK generated application
accesses the relational data through a set of views that handle some of the tricky mapping and
security problems. The views also act as a buffer between the underlying base tables and the web
application, allowing names to change in one place without affecting the other. In the future, it will be
relatively easy for LabKey to expand of the scope of our caBIG™ interface to incorporate any data
managed by LabKey Server. In fact, putting the data into LabKey may well be the fastest way for a
developer to achieve silver compliance for a data service, while at the same time gaining many of the
data analysis and management features that are built-in to the LabKey platform.