Download caCORE Runtime Architecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Asynchronous I/O wikipedia , lookup

Microsoft Access wikipedia , lookup

Semantic Web wikipedia , lookup

Data model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Operational transformation wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Versant Object Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Adapting an Existing Data Service to be caBIG™ Silver-level Compliant
Peter Hussey LabKey Software, Inc, Seattle, WA USA
Contact: [email protected]
The National Cancer Institute’s caBIG™ initiative aims for interoperability of bioinformatics
applications. caBIG™ envisions that this will be achieved by encouraging all applications to
implement a standard programming interface and to register their terms and data objects with a
centralized service. The required programming interface is essentially defined in terms of the
behavior of applications built using the caCORE Software Development Kit (SDK). The caCORE
SDK is designed and documented for building a new application from scratch. Little is
documented on how one might achieve caBIG™ silver-level compliance in an application not
built with the caCORE SDK. This poster describes the caCORE SDK development and build
process and how the LabKey team changed it to work with their existing proteomics platform
software. The LabKey/CPAS solution creates a parallel web application that supports the
caBIG™ programming interface and access CPAS data through a SQL View layer.
Introduction
The caCORE SDK is the technology foundation for caBIG™ compliant applications. The SDK is
based on a software development paradigm that starts with an abstract model of the entities
represented in a particular application. Real-world examples of such entities include identified
peptides in an MS2 run or microarray test results. Entities are usually related toapp
other entities in
known ways. For example a single MS2 “run” entity must have 1 or more “FASTA databases” and
may have 0 or one or more “identified peptides”. Generally the interesting entities in an application
are those stored in the database. There is often a close correspondence between a row (record)
in a SQL table in the database used by an application and an instance (single entity) of a class of
similar entities to be exposed by the application. The caCORE SDK architecture is based largely
on the 1-to-1 correspondence between an application class and a SQL table.
Figure 1
The caCORE
development
paradigm starts with
the creation of Class
and Data models in
UML. Objects in
these models have
relationships and
attributes that must
be specified exactly
for a successful build.
At right is a combined
Class and Data
model diagram
showing a small
subset of the models
developed to achieve
caBIG silver
compliance for
LabKey/CPAS.
The caCORE Application Paradigm
The caCORE SDK is based on a software development paradigm that starts with an abstract
model of the entities represented in a particular application. Real-world examples of such entities
include identified peptides in an MS2 run or microarray test results. Entities are usually related to
other entities in known ways. For example a single MS2 “run” entity must have 1 or more “FASTA
databases” and may have 0 or one or more “identified peptides”. Generally the interesting entities
in an application are those stored in the database. There is often a close correspondence
between a row (record) in a SQL table in the database used by an application and an instance
(single entity) of a class of similar entities to be exposed by the application. The caCORE SDK
architecture is based largely on the 1-to-1 correspondence between an application class and a
SQL table.
One of the design goals of the caCORE architecture is to create an inter-operability standard that
is not tied to a single programming language. So in the caCORE development paradigm, the
developer describes objects and their relationships in Universal Modeling Language (UML). UML
is a high-level, primarily graphical approach to defining a programming project. UML is
implemented by a number of tools including Enterprise Architect and ArgoUML, the two tools
supported by the current caCORE SDK (version 4.0). UML modeling, however, is only partly
standardized. It is very difficult to transfer a model between tools without losing information in the
transfer.
caCORE SDK Development Process
Challenges in Adapting an Existing Application to caCORE
There are three phases in the caCORE development paradigm:
1. Create the UML model elements using the UML modeling tool. This is a painstaking task for any
moderately complex real-world application. The application object model is essentially specified
twice: as a UML Class model and as a UML Data model. The Class model corresponds to the
objects in the application that a developer will ultimately use to access the data service. The
Data model describes the implementation of those classes in a relational database, In most cases
there is a single SQL table that corresponds to a single Class object. The data objects are linked
together through a set of specific relationships and attribute values that must all match exactly,
but are each specified and visible on separate property dialogs within Enterprise Architect. (Note:
the 4.0 SDK has added a very useful validation step to the build process that should make it
much easier to track down and fix inconsistencies and omissions in the UML models than what
the LabKey/CPAS team experienced.) Figure 2 shows a small subset of the LabKey/CPAS UML
model in a diagram that combines some the class elements and the data elements in a single
diagram.
2. Register the classes and attributes of the UML model objects with NCI’s Enterprise Vocabulary
Services (EVS) and the Cancer Data Standards Repository (caDSR). The common data element
identifiers resulting from this step are incorporated into the class model objects as additional
tagged values.
3. Run the SDK build process, creating three runtime entities from the model (figure 2)
Most large-scale, team-built applications are not built using an application generator approach.
LabKey/CPAS is one such application. Yet LabKey/CPAS still need to participate in the interoperability
of caBIG. For these situations, the caCORE SDK can be used to generate a web application that runs in
parallel to an existing application and exposes a caBIG™ silver-compliant programming interface over
the data managed by the non-caCORE application. The main pre-requisite to this architecture is that
the data to expose is held in a relational database. We also made the big simplification that the
caCORE-generated web application would expose read-only interfaces, which is allowed and
appropriate for caBIG™ compliance.
Within this simplified target, we still encountered difficulties
around the following:
•SQL schema implementation differences from caCORE. The caCORE SDK makes several
assumptions regarding the database schema that may not be true for an existing application:
•A class in the object model to be exposed corresponds 1-to-1 with a table in the SQL Schema
•The object identifier maps to a single integer primary key in the corresponding relational table.
•A relationship between Class objects corresponds to a foreign key in the SQL tables
•Security integration. An existing application will likely have some security implementation that
logically should extend to the caBIG™ interface. The caCORE SDK, however, discusses only the
implementation of security in a new application, not integration with an existing security model.
API libraries
UML Class Model
Web Application
SDK Build Process
UML Data Model
SQL Schema (Tables)
•Database definition scripts, in the
form of SQL CREATE TABLE
commands
•A web application that implements
the UML Class model and can
translate requests for objects into SQL
commands.
•A set of programming interface
libraries that enable applications to
query, insert, update and delete
application objects over several
different communication channels,
including local Java applications and
web service calls.
Figure 2. The caCORE SDK Build process
caCORE Runtime Architecture
In a software application based on the caCORE design, developers write web pages and
program-to-program applications using the API generated by the SDK build process. The web
application handles both read and write access to the underlying SQL database in order to
support the creation and management of application objects.
At the core of the generated caCORE web application is
Hibernate, an open source middleware layer for mapping
Java programming objects into SQL table objects and viceversa.
(Figure 3). The caCORE SDK build process
translates the UML model into configuration files that allow
Hibernate to construct complex queries by translating
relationships between objects into SQL JOIN constructs.
Hibernate allows programmers to issue database queries in
a simple a “Query By Example” format. The use of
Hibernate in the caCORE runtime yields several benefits:
•It avoids mixing SQL commands application code,
common source of bugs in web database applications.
•It is highly configurable, allowing the developer to tune
the way Hibernate translates object access into SQL.
•It supports a standardized “Hibernate Query Language”
(HQL) that looks like SQL but works unchanged across
all supported relational databases, allowing the
developer to issue more complex queries than can be
expressed via the standard QBE mechanism.
The caCORE SDK allows a developer or analyst to leverage
application model knowledge into a working web database
application that would otherwise be very difficult and
expensive to build from scratch.
Local JSP
Remote
WS
The LabKey/CPAS Solution
LabKey/CPAS resolved these challenges through the creation of a SQL View layer. In our solution, the
Data model defines a virtual schema definition in a database schema named “cabig”. , Then we
created a set of SQL views with the same names and same columns as the UML Data model. The
caCORE-generated web application interacts with these views as if they were tables. The web
application cannot tell the difference. Under the covers, the view layer passes through the queries to
the original base tables (managed by the non-caCORE application), and fixes up the differences along
the way. We wrapped the cabig view definition scripts into a new module of LabKey/CPAS and
included a small set of UI changes that configures and tests caBIG™ access for a given folder.
The view layer solves the issues described above:
•Security Integration: Since data access in LabKey/CPAS is
Search
granted on a folder-by-folder basis, we wanted to enable or
Script
Application
disable caBIG access by folder. We added a single true/false
apps
“caBIGPublished” column to our existing core.Containers table.
DataThis bit is turned on and off by the “Publish” button accessible on
Driven
pages
a project’s Permissions page. The corresponding Containers
Client
caCORE
view in the cabig schema includes the restriction “WHERE
API
API
caBIGPublished=true”. All of the other view definitions in the
cabig schema include an inner join to the cabig.Containers view.
As a result, the caBIG interface sees only data in those
containers that have been published.
caCORE
•Data Model Compliance: Most of the underlying CPAS tables
LabKey/CPAS
web
have a single integer primary key, but a few had two-column
application
integer keys. To meet the caCORE’s requirement for a single
column key, the SQL View definition includes a sum function:
SELECT ((4294967296 * op.propertyid)+op.objectid) AS id, ..
Local Java
lib
Remote
WS lib
caCORE
web application
Hibernate
Domain
model
SQL
database
Figure 3. The caCORE runtime architecture
As a second example, the PeptidesData table in CPAS is used to
store score values from different search engines in genericallynamed “ScoreX” columns.. For caBIG, we chose to represent the
scores for different engines as different objects (preserving the 1to-1 paradigm). We handled this difference in the view layer by
creating a view per search engine, with the appropriate filter.
cabig
Views
Abstract
SQL
database
Figure 4. The caCORE implementation for CPAS
Conclusion
Adapting an existing application to enable caBIG™ access proved to be relatively
straightforward. Once we decided on the basic approach of running the caCORE SDK generated
web application in parallel to LabKey/CPAS. In our design, the SDK generated application
accesses the relational data through a set of views that handle some of the tricky mapping and
security problems. The views also act as a buffer between the underlying base tables and the
web application, allowing names to change in one place without affecting the other.
The caCORE-like Web Application supports the read-only API libraries as if they were part of a
standard, “pure” caCORE-like application. This is the basis of CPAS’ successful caBIG™ silverlevel compliance validation.