Download Hea-3_4_Final[1]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data model wikipedia , lookup

Global serializability wikipedia , lookup

Commitment ordering wikipedia , lookup

Business intelligence wikipedia , lookup

SQL wikipedia , lookup

Data vault modeling wikipedia , lookup

Microsoft Access wikipedia , lookup

Serializability wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

PL/SQL wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Guide to Distributed Digital Preservation
Healey
CHAPTER 8:
THE CONSPECTUS AND TITLE DATABASE
Johnny P. Healey
SESSION LEARNING OBJECTIVES

Installing the Conspectus Database.

Maintaining the Title Database with the Conspectus.

Modifying the Conspectus Database.
OVERVIEW OF THE CONSPECTUS DATABASE
The conspectus database stores metadata that describes the content stored in the preservation
network. It stores two intersecting classes of data: the functional data that is used to populate
the LOCKSS title database, and the non-functional data which can be useful to the members of
the preservation network. The conspectus database also provides versioning support for the
metadata, allowing changes to be tracked.
The functional data will be the same for every instance of the conspectus database that is
deployed. It is comprised of the name of the plugin that is used for the harvest, as well as the
location of the data and any parameters that are used by each Archival Unit. The most obvious
advantage of automatically generating the title database is that it avoids the tedious and errorprone process of maintaining XML by hand. However, another more subtle advantage is that it
more easily supports multiple maintainers for the title database.
The non-functional data can provide several uses to the people maintaining the network. One
such advantage is that it provides a mechanism for identifying the source of an archival unit and
any intellectual property constraints that may apply to it. It also stores other potentially useful
information, such as summaries of the data types in the collections and size estimates. This
metadata is useful for tracking collections, providing summaries of overall network content, and
similar administrative tasks.
INSTALLING THE CONSPECTUS DATABASES
Software Requirements
The software requirements and installation procedure for the conspectus database should be
relatively straightforward for UNIX system administrators. It will require an instance of apache
with PHP (>=4) as well as a MySQL database and the PHP module required to access it.
Chapter 8
Page 1
Guide to Distributed Digital Preservation
Healey
Installation of the Conspectus Database
The first step towards installing the conspectus is to untar the file with the command: 'tar -xzf
conspectus.tgz' . This should create a directory with the PHP files that drive the conspectus. To
install the conspectus, make an appropriate directory somewhere in the web root of the web
server and copy the PHP files to that location. These should provide all of the code that is
needed.
Configuring the Conspectus
The next step is to create and populate the database. The tarball contains two SQL scripts,
'create.sql' and 'destroy.sql'. The easiest way to create the database is to log in to the mysql
server as an administrative user, create the database, and then populate it with the 'create.sql'
script. Creating and populating the database should look like this:
# CREATE DATABASE conspectus;
# GRANT SELECT, UPDATE, DELETE ON conspectus.* TO
conspectus@localhost IDENTIFIED BY '{password}';
# USE conspectus;
# \. create_tables.sql
To point the conspectus scripts at the proper database, edit the 'mysql_includes.php' file in the
directory of the code in the web root. The variables 'dbname', 'dbuser', 'dbpass', and 'dbhost'
correspond to the database name, user name, password, and host.
MAINTAINING THE TITLE DATABASE
Maintaining a Title Database with the Conspectus
The default output of the conspectus database is an XML dialect of RDF. The title database also
takes the form of XML. Thus, it isn't a huge stretch for the conspectus database to generate the
title database as well. For each title that is to be generated for the title database, there is a set of
values that must be present in the title database:

Journal Name

Plugin Name

The base URL of the Journal

Any additional plugin parameters (these are optional, depending on the plugin)
When a collection has all of the metadata required for harvesting, it can be made available to the
title database. Any user who is logged in to the system can select a title to be entered. From the
main page of the conspectus, follow the link to “Select Collections for Harvesting.” This brings
up a page where the valid collections are listed with checkboxes. Make sure any desired
collections are checked off and press the “Store” button to select them.
Chapter 8
Page 2
Guide to Distributed Digital Preservation
Healey
Configuring Plugin Parameters for Archival Units
Configuring plugin parameters for the collections can be one of the more challenging tasks. The
parameters are entered in the “Extra Parameters” field in the “Harvesting Information” section.
Each entry in this field corresponds to an Archival Unit in the title database. The parameters for
each archival unit are entered as a comma-separated list where each parameter takes the form of
“name=value”.
One of the parameters that LOCKSS requires is the base_url, which specifies the URL where the
data is. This parameter is automatically taken from the “Collection URI” field for the item. If
you wish to override this value, it must be the first parameter specified.
Examples of Plugin Parameters:
Description of Parameters
Conspectus Entry
This AU takes two parameters, a journal journal_id=my journal,issue=50
id and issue number.
This AU overrides the base_url and has base_url=http://example.org/base/,year=1993
a year.
MODIFYING THE CONSPECTUS
The conspectus database is slightly different from the average PHP application. Some of the
code used to generate the interface and store data in the MySQL database is generated by an
XML file and XSLT template. For convenience, this process is driven by a Makefile. The
important files for editing the conspectus are “formgen.xsl,” “metaform.xml,” and “classes.php.”
As their filenames suggest, “metaform.xml” is the file that describes the form and “formgen.xsl”
is the XSLT template that drives the generation of the form. Changing, adding, and removing
items from the “metaform.xml” are the primary methods of altering the conspectus. There are
two elements that are most useful for editing the form: “formitem” and “complexitem.” Each of
these elements should have an attribute called “name”. This not only acts as a name for the item,
but also describes its location in the XML output.
Form items are the most basic elements of the form. Each one has a class attribute that
corresponds to a class in “classes.php”. Every class provides a widget as well as a mapping
between the POST data, the MySQL database, and the RDF output. Some of the most
commonly used classes are “TextBox,” “TextArea,” and “DropDown,” but there are also some
more complicated classes such as “DateRange.” Aside from the “name” and “class” attributes,
the form items should also have a “title” attribute, which acts as a human-readable name for the
item. There are two optional attributes which are “required” and “repeatable.” If “required” is
set to “true,” then the element will be considered essential, and the data will not be allowed into
the title database until something has been entered. The “repeatable” attribute also accepts a
boolean value; allowing multiple copies of an element to be entered for a single collection
record.
Chapter 8
Page 3
Guide to Distributed Digital Preservation
Healey
The complex items use the same attributes as the form items with the exception of the “class”
attribute. Instead of having a class, complex items are composed of the classes described within
them. This allows complex widgets to be created within the xml form document.
Chapter 8
Page 4