Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PREMIS Data Dictionary and the Future of Preservation Metadata Brian Lavoie Research Scientist OCLC Research [email protected] Society of American Archivists Washington, D.C. August 5, 2006 Preservation Metadata “Information that supports and documents the digital preservation process” Provenance What IPR must be observed? Who has had custody/ownership of the digital object? Rights Mgmt. Authenticity PRESERVATION METADATA What is needed to render and use the digital object? Technical Environment Preservation Activity Is the digital object what it purports to be? What has been done to preserve the digital object? Why is preservation metadata important? Digital objects are technology-dependent … Complex technical environment between content and user • Means to access and use archived object must be documented • Technical metadata especially important • Digital objects are mutable … Can be easily altered, impacting look, feel, functionality • Changes to object must be documented/validated • Provenance/authenticity metadata especially important • Digital objects are bound by intellectual property rights … Preservation often proceeds while copyright still in effect • May constrain preservation activities and access policies • Rights management metadata especially important • Makes digital objects self-documenting across time PREMIS Working Group “Early days” … various preservation metadata element sets released Different scopes, purposes, underlying models/assumptions • No international standard; little consolidation of expertise/best practice • June 2003: OCLC, RLG sponsored international working group: • PREMIS: Preservation Metadata: Implementation Strategies Objective: • Define implementable, core preservation metadata, with guidelines/recommendations for management and use Membership: • > 30 experts from 5 countries, libraries, museums, archives, government agencies, private sector PREMIS Data Dictionary May 2005: Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group 237-page report includes: PREMIS Data Dictionary 1.0 • Context/assumptions, data model, usage examples • Set of XML schema to support implementation Data Dictionary: Comprehensive view of information needed to support digital preservation • Based on deep pool of institutional experiences in setting up and managing operational capacity for digital preservation • http://www.oclc.org/research/projects/pmwg/premis-final.pdf 2005 British Conservation Awards: Digital Preservation Award 2006 Society of American Archivists Preservation Publication Award Some guiding principles … “Implementable, core, preservation metadata”: • • • “Preservation metadata”: maintain viability, renderability, understandability, authenticity, identity in a preservation context “Core”: What most preservation repositories need to know to preserve digital materials over the long-term “Implementable”: rigorously defined; supported by usage guidelines/recommendations; emphasis on automated workflows “Technical neutrality”: Digital archiving system: no assumptions about specific archiving technology, system/DB architectures, preservation strategy • Metadata management: no assumptions about whether metadata is stored locally or in external registry; recorded explicitly or known implicitly; instantiated in one metadata element or multiple elements • Promotes flexibility, applicability in wide range of contexts • Sample Data Dictionary entry Semantic unit Semantic components Definition Rationale Data constraint Object category Applicability Examples Repeatability Obligation Creation/ Maintenance notes Usage notes size None The size in bytes of the file or bitstream stored in the repository. Size is useful for ensuring the correct number of bytes from storage have been retrieved and that an application has enough room to move or process files. It might also be used when billing for storage. Integer Representation File Bitstream Not applicable Applicable Applicable 2038927 Not repeatable Not repeatable Optional Optional Automatically obtained by the repository. Defining this semantic unit as size in bytes makes it unnecessary to record a unit of measurement. However, for the purpose of data exchange the unit of measurement should be stated or understood by both partners. PREMIS Maintenance Activity Web site: Permanent Web presence, hosted by Library of Congress • Central destination for PREMIS-related info, announcements, resources • Home of the PREMIS Implementers’ Group (PIG) discussion list • PREMIS Editorial Committee: • • • • Set directions/priorities for PREMIS development Coordinate future revisions of Data Dictionary and XML schema Membership: Library of Congress, OCLC, FCLA, National Archives of Scotland, British Library, National Library of Australia, U. of Goettingen, LANL, (two more seats still TBD) Will convene late August/early September http://www.loc.gov/standards/premis/ Current activities Documenting errata and proposed revisions to Data Dictionary (feedback through PIG list) • http://www.loc.gov/standards/premis/changes.html PREMIS Implementers’ Registry • http://www.loc.gov/standards/premis/premis-registry.html Consultancies: Rights issues for digital preservation (Karen Coyle) • PREMIS implementation guidelines and recommendations (Deborah Woodyard-Robinson) • PREMIS Workshops: 2-day tutorial on Data Dictionary and implementation issues • Digital Curation Center PREMIS workshop (July 17-18 Glasgow) • DLF Forum (Boston, early November) • Looking to the Future … Basic questions (“what type”, “how much”) still unsettled … Digital preservation processes still not fully tested/understood • Hard to judge effectiveness a priori • Important to document and share practical experience • Workflows for preservation metadata … Tools to support automatic generation of preservation metadata (JHOVE, NLNZ tools) • Tools should support formal metadata schemas (like PREMIS) • Registries (PRONOM, GDFR) • Harmonization with other initiatives … Integrate PREMIS with other standards, technologies, best practices • E.g., Z39.87, METS • Not just standards, but integrated solutions • Division of labor … • Efficient strategies for collecting preservation metadata: i.e., WHO and WHEN (Automatic Exposure project)