Download Key Oracle Technologies used by BioMed Central

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Microsoft Access wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

PL/SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Oracle Database wikipedia , lookup

Transcript
+
The Open Access Publisher
Agenda




Oracle interMedia Overview
Open Access for the Life Science Community
BioMed Central Business Model
Oracle Technologies used by BioMed Central
Oracle interMedia
Multimedia Databases
Multi-Terabyte
Performance
Agenda
• The Media-enabled Oracle Platform
 Benefits
 Customer Experience
g
 Oracle Database 10 New Features
 Proposed Enhancements
The Media-enabled Oracle Platform
 Oracle Database 10
–
–
–
g
Storage, management, & retrieval of image, audio, video
data
Native format understanding, metadata extraction,
methods for image processing
Support for leading streaming media servers
g
 Oracle Application Server 10
–
–
–
JSP, Servelet and PL/SQL application development
support
Media Adaptation Services for Wireless
JDeveloper (BC4J) and Portal integration
 Oracle Collaboration Suite
–
Metadata extraction for OCS Files
Benefits:
Save labor, time and money
BioMed Central:
• Automated media processing, serving & integration
New Mexico Department of Transportation:
• A Single DBA designed, created, deployed, and maintains a 5 TB
image management system
Palazzo Braschi Museum - Rome:
• Reduced image processing time by 90% to bulk load and process
images as compared to client side tools.
A US Central bank
• On-line processing and rapid resolution of 26,000 bad checks per
day reduces handling and float costs.
Fast & Scalable
 1TB image repository renders images in
Web browser in less than 0.4 second
 Loads at device speeds
 Multi-terabyte multimedia databases
–
–
5 TB database
140 million images
 Scalable bulk load and process
–
–
Parallel processes load 300,000 images/hour
Bulk process – tiff to gif conversion, scale to
thumbnail
* USB Paine Webber, Caixa Economica Federal, NM DOT
Secure and Manageable

Use all Oracle Database security features
–
authentication, auditing, encryption, access
control, etc.

Banks and Commercial Web sites use it

One management environment for all data
–
Single DBA for 5TB database
–
3TB financial database
* A US Central Bank, BioMed Central, Cre8tiv - UK, Spa
Microsystems – UK, NM DOT, Caixa Economica Federal
Oracle Simplifies Code
Image Insert using Multimedia JSP Tag
Library– An Example
With JSP Tag Library: (14 point font)
<ord:embedImage connCache =
<%
java.util.Vector otherValuesVector = new java.util.Vector();
otherValuesVector.add(fd.getParameter("desc"));
otherValuesVector.add(fd.getParameter("loc"));
%>
“
mediaParameters = "photo"
otherColumns = "description, location"
otherValues = "<%=otherValuesVector%>"
/>
Without: (in 10point font)
<FORM ACTION="PhotoAlbumInsert.jsp" METHOD="POST"stmt.setString( 1, formData.getParameter( "description" ) );
ENCTYPE="MULTIPART/FORM-DATA">
rset = (OracleResultSet)stmt.executeQuery();
Description: <INPUT TYPE="text" NAME="desc"><BR>
rset.next();
Location: <INPUT TYPE="text" NAME="loc"><BR>
OrdImage photo = (OrdImage)rset.getCustomDatum( 1,
Photo: <INPUT TYPE ="file" NAME="photo"><BR>
OrdImage.getFactory());
<INPUT TYPE ="submit" VALUE="submit">
rset.close();
</FORM> try
stmt.close();
{
// Load the photo into the database and set the
// Parse multipart/form-data
properties.
formData.setServletRequest( request );
formData.getFileParameter( "photo" ).loadImage( photo
formData.parseFormData();
);
// Update object in database
// Insert new row into database
stmt =
stmt = (OraclePreparedStatement)conn.prepareStatement(
(OraclePreparedStatement)conn.prepareStatement(
"insert into spec_photos ( description,
"update spec_photos set photo = ? where
location, photo ) " +
description = ?" );
" values ( ?, ?, ORDSYS.ORDImage.init() )" );
stmt.setCustomDatum( 1, photo );
stmt.setString( 1, formData.getParameter( "description" stmt.setString( 2, formData.getParameter(
) );
"description" ) );
stmt.setString( 2, formData.getParameter( "location" ) stmt.execute();
);
stmt.close();
stmt.executeUpdate();
stmt.close();
// Commit changes
// Fetch OrdImage object from database
conn.commit(); }
stmt =
finally {
// Ensure JDBC connection is released and any temp
files
are deleted.
(OraclePreparedStatement)conn.prepareStatement(
album.release();
"select photo from spec_photos where description = ?
formData.release();
for update" );
}
%>
g
New Oracle10 Multimedia Features
 Standards Support – SQL/MM Still Image
 New version of Java Advanced Imaging and
additional image processing operators
 Support for additional media formats
–
•
•
•
Microsoft ASF, MPEG2 & MPEG4
Microsoft Windows Media Server Plugin
Real Server Plugin for Helix Server
XML DB integration
Proposed Enhancements
 Parse TIFF headers for user-specified attributes
 Metadata mgt., e.g. microarrays, gels, mass spec.
 Characterize a region of interest for an image
 Plug-in 3rd party algorithms & utilities
 Manage media metadata in XML DB
 Describe user-defined file formats
 Keep a history of changes to images
 Handle 3-D images (time/volume)
 DICOM Support
Session id: 40363
Multimedia Database
Improves the Bottom Line
Matthew Cockerill
Technical Director
BioMed Central
BioMed Central and Oracle
 BioMed Central is an Open Access publisher
of biomedical research
 Oracle database technology used to deliver a
cost-effective online publishing solution
 Goals
–
–
Make the publishing process more efficient
through online tools and automation
Increase accessibility of research by removing
subscription barriers
Oracle technology used by
BioMed Central
 BioMed Central’s database
–
–
–
70 gigabytes of data (and growing rapidly)
Lots of traditional relational data
(e.g. 250,000 registered users)
Also serves as a repository for images, movies, PDFs
and other rich media
 Key technologies used
–
–
–
Real Application Clusters
Data Guard
Oracle Text
–
–
XML DB
Oracle interMedia
Oracle technology used by
BioMed Central
 BioMed Central’s database
–
–
–
70 gigabytes of data (and growing rapidly)
Lots of traditional relational data
(e.g. 250,000 registered users)
Also serves as a repository for images, movies,
PDFs and other rich media
 Key technologies used
–
–
–
Real Application Clusters
Data Guard
Oracle Text
–
–
XML DB
Oracle interMedia
What is wrong with traditional
science publishing?
 Subscription-only access to scientific research is a
legacy of the economics of print
 Scientists do all the hard work
–
–
–
–
performing the research
writing up the article
acting as peer reviewers
acting as journal editors
 Traditional publishers take ownership of the copyright
and sell limited access back to the scientific community
 In the age of the web that makes no sense for science
 Open Access publishers make research freely
accessible and redistributable by scientists
Benefits of Open Access
 Research instantly accessible to the
entire scientific community
 Digital permanence (many copies)
 A route off the subscriptions treadmill
–
Subscriptions to traditional journals have
increased at 10-15% per annum
 Data mining
 Grid computing
Tony Blair
“[The] national e-science grid …
intends to make access to
computing power, scientific data
repositories and experimental
facilities as easy as the web makes
access to information.”
- Tony Blair, May 2002
The Open Access movement
 Public Library of Science
–
–
New not-for-profit publisher formed by a group of scientists
Has received $9m from Gordon and Betty Moore
Foundation to start new Open Access journals
 Soros Foundation
–
Has provided $3m to support Open Access publishing in
developing and transitional countries
 Sabo bill
–
–
Congressman Martin Sabo recently introduced the Public
Access to Science Act in Congress
If passed it would ensure that all US federally funded
research would be published with Open Access
BioMed Central architecture
 Oracle9i Database
–
9i
–
Stores relational data (e.g.
user registration info)
Also acts as repository for
files associated with
 submitted manuscripts
 published articles
 Web server farm
–
–
–
Runs many different journal
websites, all driven by the same
Oracle database
Extensive use of Java and XSLT
Media content streamed from
the database using servlets
Key Oracle Technologies used
by BioMed Central





Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
Key Oracle Technologies used
by BioMed Central





Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
Importance of high availability
 Science is a global enterprise, so BioMed
Central’s websites are busy 24 hours a day
 Scientists entrust their research and
reputation to us - they must have confidence
that their research will be available
 Major institutional customers demand high
reliability
 BioMed Central delivers high availability using
a combination of RAC and Data Guard
Real Application Clusters
 BioMed Central was one of the first
organizations in the UK to deploy 9i RAC
 Main database runs on a pair of dual CPU
Sun Fire V480 servers
 Delivers high availability in the event of single
node failure
 Oracle upgrades/patches do currently require
downtime however (for now!)
Data Guard
 BioMed Central uses Data Guard to maintain a
standby database
 Standby database kept up to date by automated
application of log files
 Standby database can be used for reporting (in readonly mode)
 If a prolonged outage of live db occurs (planned or
unplanned), standby database can be activated
 Data Guard makes it easy to roll back to the live
configuration after planned outages
RAC/Data Guard configuration
logfiles
RAC Cluster
Standby DB
(Data Guard)
Web server farm
Reporting
Main hosting location
Standby location
RAC/Data Guard configuration
RAC Cluster
Standby DB
(Data Guard)
Web server farm
Reporting
Main hosting location
Standby location
Key Oracle Technologies used
by BioMed Central





Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
Use of Oracle Text
 High performance full text article search Key
benefits
–
–
–
–
–
Ease of maintenance
(incremental online indexing)
Structured searching of XML
XPath support
Unicode aware (smart base-character indexing)
Filter procedures can be used to transform XML
to be indexed
Structured search
XPath search
 Prior to Oracle9i Database Release 2, relatively basic
field restrictions based on XML tags were possible
 Complex nesting of tags, or specific attribute values
were difficult or impossible to search for
 Oracle9i Database Release 2 support for Xpath field
restrictions takes XML searching to another level
 Now possible to search for all XML articles that
contain a certain path (HASPATH), or that match a
certain text expression at that path (INPATH)
XPath example
Article metadata identifying a series of related articles
<meta>
<classifications>
<classification type="BMC"
subtype="review_series_title"
id="ar-cell-cell">Cell-cell interactions
in synovitis</classification>
</classifications>
</meta>
SQL syntax to retrieve all articles in that review series
SELECT ARX_ID FROM ARX WHERE CONTAINS (ARX_FULL, 'HASPATH
(//classification[@type="BMC“ AND
@subtype="review_series_title" AND @id="ar-cell-cell"])')>0;
Smart handling of Unicode
Key Oracle Technologies used
by BioMed Central





Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
XML DB
 Oracle support for XML standards in the
database allows BioMed Central to manage
article XML data within database
 Examples of use
–
–
Re-validate article XML against DTD after any
update
Application of XSLT transformations within
database (e.g. as a pre-indexing filter)
Article XML (pre-transform)
<bibl>
<title> Genetic variability in MCF-7 sublines</title>
<aug>
<au id="A1">
<snm>Nugoli</snm>
<fnm>Melanie</fnm>
<mi>JK</mi>
<email>[email protected]</email>
</au>
<au id="A2">
<snm>Chuchana</snm>
<fnm>Paul</fnm>
<email>[email protected]</email>
</au>
</aug>
<source>BMC Medical Research Methodology</source>
…
</bibl>
Article XML (post-transform)
<bibl>
<title> Genetic variability in MCF-7 sublines</title>
<aug>
<au id="A1">
<snm>Nugoli</snm>
<fnm>Melanie</fnm>
<mi>JK</mi>
<bnm>Nugoli_MJK</bnm>
<email>[email protected]</email>
</au>
<au id="A2">
<snm>Chuchana</snm>
<fnm>Paul</fnm>
<bnm>Chuchana_P</bnm>
<email>[email protected]</email>
</au>
</aug>
<source>
<sourcefull>BMC Medical Research Methodology</sourcefull>
<sourceabbr>BMC Med Res Methodol</sourceabbr>
</source>
…
</bibl>
Key Oracle Technologies used
by BioMed Central





Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
interMedia:
Oracle as a media repository
 Manuscript submission and workflow involves
a complex interplay of files and metadata
 Storing files directly in the database as BLOBs
makes their management and manipulation
much simpler
 interMedia provides a powerful set of tools to
work with images in the database
–
–
Extracting image metadata
Scaling/cropping/format conversion
Full text article
Figure streamed from db
PDF streamed from database
Processing submitted files
Using interMedia to
manipulate images
Q&A
QUESTIONS
ANSWERS
Sp
eaker
Na
me
Sp eak er
Ti t
l e
Sp eak er
Ti t
l e
Or ac l e
Co r p o r at
i on