* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Hea-3_4_Final[1]
Survey
Document related concepts
Global serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
Business intelligence wikipedia , lookup
Data vault modeling wikipedia , lookup
Microsoft Access wikipedia , lookup
Serializability wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Versant Object Database wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Transcript
Guide to Distributed Digital Preservation Healey CHAPTER 8: THE CONSPECTUS AND TITLE DATABASE Johnny P. Healey SESSION LEARNING OBJECTIVES Installing the Conspectus Database. Maintaining the Title Database with the Conspectus. Modifying the Conspectus Database. OVERVIEW OF THE CONSPECTUS DATABASE The conspectus database stores metadata that describes the content stored in the preservation network. It stores two intersecting classes of data: the functional data that is used to populate the LOCKSS title database, and the non-functional data which can be useful to the members of the preservation network. The conspectus database also provides versioning support for the metadata, allowing changes to be tracked. The functional data will be the same for every instance of the conspectus database that is deployed. It is comprised of the name of the plugin that is used for the harvest, as well as the location of the data and any parameters that are used by each Archival Unit. The most obvious advantage of automatically generating the title database is that it avoids the tedious and errorprone process of maintaining XML by hand. However, another more subtle advantage is that it more easily supports multiple maintainers for the title database. The non-functional data can provide several uses to the people maintaining the network. One such advantage is that it provides a mechanism for identifying the source of an archival unit and any intellectual property constraints that may apply to it. It also stores other potentially useful information, such as summaries of the data types in the collections and size estimates. This metadata is useful for tracking collections, providing summaries of overall network content, and similar administrative tasks. INSTALLING THE CONSPECTUS DATABASES Software Requirements The software requirements and installation procedure for the conspectus database should be relatively straightforward for UNIX system administrators. It will require an instance of apache with PHP (>=4) as well as a MySQL database and the PHP module required to access it. Chapter 8 Page 1 Guide to Distributed Digital Preservation Healey Installation of the Conspectus Database The first step towards installing the conspectus is to untar the file with the command: 'tar -xzf conspectus.tgz' . This should create a directory with the PHP files that drive the conspectus. To install the conspectus, make an appropriate directory somewhere in the web root of the web server and copy the PHP files to that location. These should provide all of the code that is needed. Configuring the Conspectus The next step is to create and populate the database. The tarball contains two SQL scripts, 'create.sql' and 'destroy.sql'. The easiest way to create the database is to log in to the mysql server as an administrative user, create the database, and then populate it with the 'create.sql' script. Creating and populating the database should look like this: # CREATE DATABASE conspectus; # GRANT SELECT, UPDATE, DELETE ON conspectus.* TO conspectus@localhost IDENTIFIED BY '{password}'; # USE conspectus; # \. create_tables.sql To point the conspectus scripts at the proper database, edit the 'mysql_includes.php' file in the directory of the code in the web root. The variables 'dbname', 'dbuser', 'dbpass', and 'dbhost' correspond to the database name, user name, password, and host. MAINTAINING THE TITLE DATABASE Maintaining a Title Database with the Conspectus The default output of the conspectus database is an XML dialect of RDF. The title database also takes the form of XML. Thus, it isn't a huge stretch for the conspectus database to generate the title database as well. For each title that is to be generated for the title database, there is a set of values that must be present in the title database: Journal Name Plugin Name The base URL of the Journal Any additional plugin parameters (these are optional, depending on the plugin) When a collection has all of the metadata required for harvesting, it can be made available to the title database. Any user who is logged in to the system can select a title to be entered. From the main page of the conspectus, follow the link to “Select Collections for Harvesting.” This brings up a page where the valid collections are listed with checkboxes. Make sure any desired collections are checked off and press the “Store” button to select them. Chapter 8 Page 2 Guide to Distributed Digital Preservation Healey Configuring Plugin Parameters for Archival Units Configuring plugin parameters for the collections can be one of the more challenging tasks. The parameters are entered in the “Extra Parameters” field in the “Harvesting Information” section. Each entry in this field corresponds to an Archival Unit in the title database. The parameters for each archival unit are entered as a comma-separated list where each parameter takes the form of “name=value”. One of the parameters that LOCKSS requires is the base_url, which specifies the URL where the data is. This parameter is automatically taken from the “Collection URI” field for the item. If you wish to override this value, it must be the first parameter specified. Examples of Plugin Parameters: Description of Parameters Conspectus Entry This AU takes two parameters, a journal journal_id=my journal,issue=50 id and issue number. This AU overrides the base_url and has base_url=http://example.org/base/,year=1993 a year. MODIFYING THE CONSPECTUS The conspectus database is slightly different from the average PHP application. Some of the code used to generate the interface and store data in the MySQL database is generated by an XML file and XSLT template. For convenience, this process is driven by a Makefile. The important files for editing the conspectus are “formgen.xsl,” “metaform.xml,” and “classes.php.” As their filenames suggest, “metaform.xml” is the file that describes the form and “formgen.xsl” is the XSLT template that drives the generation of the form. Changing, adding, and removing items from the “metaform.xml” are the primary methods of altering the conspectus. There are two elements that are most useful for editing the form: “formitem” and “complexitem.” Each of these elements should have an attribute called “name”. This not only acts as a name for the item, but also describes its location in the XML output. Form items are the most basic elements of the form. Each one has a class attribute that corresponds to a class in “classes.php”. Every class provides a widget as well as a mapping between the POST data, the MySQL database, and the RDF output. Some of the most commonly used classes are “TextBox,” “TextArea,” and “DropDown,” but there are also some more complicated classes such as “DateRange.” Aside from the “name” and “class” attributes, the form items should also have a “title” attribute, which acts as a human-readable name for the item. There are two optional attributes which are “required” and “repeatable.” If “required” is set to “true,” then the element will be considered essential, and the data will not be allowed into the title database until something has been entered. The “repeatable” attribute also accepts a boolean value; allowing multiple copies of an element to be entered for a single collection record. Chapter 8 Page 3 Guide to Distributed Digital Preservation Healey The complex items use the same attributes as the form items with the exception of the “class” attribute. Instead of having a class, complex items are composed of the classes described within them. This allows complex widgets to be created within the xml form document. Chapter 8 Page 4