Download Oracle interMedia

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Microsoft Access wikipedia , lookup

Concurrency control wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Oracle Database wikipedia , lookup

Transcript
+
The Open Access Publisher
Session id: 40363
Manage a Variety of Life
Sciences Data
Matthew Cockerill
Technical Director
BioMed Central
BioMed Central and Oracle
 BioMed Central is an Open Access publisher
of biomedical research
 Oracle database technology used to deliver a
cost-effective online publishing solution
 Goals
–
–
Make the publishing process more efficient
through online tools and automation
Increase accessibility of research by removing
subscription barriers
Oracle technology used by
BioMed Central
 BioMed Central’s database
–
–
–
70 gigabytes of data (and growing rapidly)
Lots of traditional relational data
(e.g. 250,000 registered users)
Also serves as a repository for images, movies, PDFs
and other rich media
 Key technologies used
–
–
–
Real Application Clusters
Data Guard
Oracle Text
–
–
–
XML DB
Oracle interMedia
Java
Why Open Access?
 Subscription-only access to scientific research is a
legacy of the economics of print
 Scientists do all the hard work
–
–
–
–
performing the research
writing up the article
acting as peer reviewers
acting as journal editors
 Traditional publishers take ownership of the copyright
and sell limited access back to the scientific community
 In the age of the web that makes no sense for science
 Open Access publishers make research freely
accessible and redistributable by scientists
Benefits of Open Access
 Research instantly accessible to the
entire scientific community
 Digital permanence (many copies)
 A route off the subscriptions treadmill
–
Subscriptions to traditional journals have
increased at 10-15% per annum
 Data mining
 Grid computing
Tony Blair
“[The] national e-science grid …
intends to make access to
computing power, scientific data
repositories and experimental
facilities as easy as the web makes
access to information.”
- Tony Blair, May 2002
The Open Access movement
 Public Library of Science
–
–
New not-for-profit publisher formed by a group of scientists
Has received $9m from Gordon and Betty Moore
Foundation to start new Open Access journals
 Soros Foundation
–
Has provided $3m to support Open Access publishing in
developing and transitional countries
 Sabo bill
–
–
Congressman Martin Sabo recently introduced the Public
Access to Science Act in Congress
If passed it would ensure that all US federally funded
research would be published with Open Access
Business model
 Minimize costs via smart use of technology
 Cover costs of publication via Article
Processing Charge ($500)
 Free submission for authors from institutions
that become members
 More than 300 institutional members already
–
e.g.US National Institutes of Health, the National
Health Service of England, all campuses of the
University of California, Harvard, Yale, Princeton,
and every academic institution in the UK
Economic benefits
 Cost per article access for Elsevier journals estimated
at approx US $11
(Washington State University estimate)
 A typical BioMed Central open-access research article
receives 2000+ full text accesses in the 2 years
following publication, and a similar number of
accesses again via mirrors (e.g. PubMed Central)
 Cost to the scientific community = $500.
So cost per access: $500/(2000*2) = $0.125
 Cost per access reduced almost 100-fold!
BioMed Central architecture
 Oracle9i Database
–
9i
–
Stores relational data (e.g.
user registration info)
Also acts as repository for
media files associated with
 submitted manuscripts
 published articles
 Web server farm
–
–
–
Runs many different journal
websites, all driven by the same
Oracle database
Extensive use of Java and XSLT
Media content streamed from
the database using servlets
Key Oracle Technologies used
by BioMed Central






Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
Java
Key Oracle Technologies used
by BioMed Central






Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
Java
Importance of high availability
 Science is a global enterprise, so BioMed
Central’s websites are busy 24 hours a day
 Scientists entrust their research and
reputation to us - they must have confidence
that their research will be available
 Major institutional customers demand high
reliability
 BioMed Central delivers high availability using
a combination of RAC and Data Guard
Real Application Clusters
 BioMed Central was one of the first
organizations in the UK to deploy 9i RAC
 Main database runs on a pair of dual CPU
Sun Fire V480 servers
 Delivers high availability in the event of single
node failure
 Oracle upgrades/patches do currently require
downtime however (for now!)
Data Guard
 BioMed Central uses Data Guard to maintain a
standby database
 Standby database kept up to date by automated
application of log files
 Standby database can be used for reporting (in readonly mode)
 If a prolonged outage of live db occurs (planned or
unplanned), standby database can be activated
 Data Guard makes it easy to roll back to the live
configuration after planned outages
RAC/Data Guard configuration
logfiles
RAC Cluster
Standby DB
(Data Guard)
Web server farm
Reporting
Main hosting location
Standby location
RAC/Data Guard configuration
RAC Cluster
Standby DB
(Data Guard)
Web server farm
Reporting
Main hosting location
Standby location
Key Oracle Technologies used
by BioMed Central






Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
Java
Use of Oracle Text
 High performance full text article search Key
benefits
–
–
–
–
–
Ease of maintenance
(incremental online indexing)
Structured searching of XML
XPath support
Unicode aware (smart base-character indexing)
Filter procedures can be used to transform XML
to be indexed
Structured search
XPath search
 Prior to Oracle9i Database Release 2, relatively basic
field restrictions based on XML tags were possible
 Complex nesting of tags, or specific attribute values
were difficult or impossible to search for
 Oracle9i Database Release 2 support for Xpath field
restrictions takes XML searching to another level
 Now possible to search for all XML articles that
contain a certain path (HASPATH), or that match a
certain text expression at that path (INPATH)
XPath example
Article metadata identifying a series of related articles
<meta>
<classifications>
<classification type="BMC"
subtype="review_series_title"
id="ar-cell-cell">Cell-cell interactions
in synovitis</classification>
</classifications>
</meta>
SQL syntax to retrieve all articles in that review series
SELECT ARX_ID FROM ARX WHERE CONTAINS (ARX_FULL, 'HASPATH
(//classification[@type="BMC“ AND
@subtype="review_series_title" AND @id="ar-cell-cell"])')>0;
Smart handling of Unicode
Key Oracle Technologies used
by BioMed Central






Real Application Clusters
Data Guard
Oracle Text
XML DB
Oracle interMedia
Java
XML DB
 Oracle’s support for XML standards in the
database allows BioMed Central to manage
article XML data within database
 Examples of use
–
–
Re-validate article XML against DTD after any
update
Application of XSLT transformations within
database (e.g. as a pre-indexing filter)
Article XML (pre-transform)
<bibl>
<title> Genetic variability in MCF-7 sublines</title>
<aug>
<au id="A1">
<snm>Nugoli</snm>
<fnm>Melanie</fnm>
<mi>JK</mi>
<email>[email protected]</email>
</au>
<au id="A2">
<snm>Chuchana</snm>
<fnm>Paul</fnm>
<email>[email protected]</email>
</au>
</aug>
<source>BMC Medical Research Methodology</source>
…
</bibl>
Article XML (post-transform)
<bibl>
<title> Genetic variability in MCF-7 sublines</title>
<aug>
<au id="A1">
<snm>Nugoli</snm>
<fnm>Melanie</fnm>
<mi>JK</mi>
<bnm>Nugoli_MJK</bnm>
<email>[email protected]</email>
</au>
<au id="A2">
<snm>Chuchana</snm>
<fnm>Paul</fnm>
<bnm>Chuchana_P</bnm>
<email>[email protected]</email>
</au>
</aug>
<source>
<sourcefull>BMC Medical Research Methodology</sourcefull>
<sourceabbr>BMC Med Res Methodol</sourceabbr>
</source>
…
</bibl>
XML in action: Faculty of 1000
 Literature awareness service for scientists
 More than 1000 experts submit evaluations of
the best new scientific research (via the web)
 Evaluations rank articles by level of interest,
and classify them by type and by subject
 Faculty of 1000 website digest this info into a
a listing of ‘hot articles’
 XML use is critical to performance
Faculty of 1000 - typical article
XML improves performance of
Faculty of 1000
 Navigating deeply relational data can be slow
 Web application data is searched and accessed
frequently, but changes relatively rarely
 Solution:
Use Oracle triggers to regenerate an XML summary column for
each live record, whenever data affecting that record changes
 This kills two birds with one stone
–
–
Structure of XML tuned to allow any required search/
browse to be done efficiently as a pure Oracle Text query
XML summary can easily be converted to HTML using
XSLT for display on the web
Key Oracle Technologies used
by BioMed Central






Real Application Clusters
DataGuard
Oracle Text
XML DB
Oracle interMedia
Java
interMedia:
Oracle as a media repository
 Manuscript submission and workflow involves
a complex interplay of files and metadata
 Storing files directly in the database as BLOBs
makes their management and manipulation
much simpler
 interMedia provides a powerful set of tools to
work with images in the database
–
–
Extracting image metadata
Scaling/cropping/format conversion
Full text article
Figure streamed from db
PDF streamed from database
Processing submitted files
Using interMedia to
manipulate images
Key Oracle Technologies used
by BioMed Central






Real Application Clusters
DataGuard
Oracle Text
XML DB
Oracle interMedia
Java
Java in the database




Java stored procedures offer flexibility
Facilitate transport of code between tiers
Allow use of standard Java libraries
One example:
–
–
–
Oracle’s original XSLT implementation performed poorly
with BioMed Central’s large article XML files and this was
rate limiting for our article XML indexing/filtering process
Thanks to the JVM built into the database, we were able to
make use of the open source XSLTC implementation
End result was that we cut re-index time in half
Find out more…
 See a demonstration
–
BioMed Central’s use of Oracle interMedia
technology is being demonstrated on the
conference floor
 Or take a look for yourself
–
–
http://www.biomedcentral.com/
http:///www.facultyof1000.com/
Q&A
QUESTIONS
ANSWERS
Sp
eaker
Na
me
Sp eak er
Ti t
l e
Sp eak er
Ti t
l e
Or ac l e
Co r p o r at
i on