Download Nanite - OPF Labs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A Weekend with Nanite
Large scale characterisation of web archives
Per Møldrup-Dalum
State and University Library
SCAPE Information Day
State and University Library, Denmark, 2014-06-25
Agenda
• A short introduction to the experiment
• A live demonstration
•
•
•
•
A look at the data for characterisation
A look at the input for the job
Run the job
Analysis of the output and of the run itself.
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
2
Task at Hand
• Performance-testing the tools
• SCAPE User Story: As a Web Archive I need a Digital
Preservation System that can process both ARC and
WARC files and identify file formats/characterize of
items contained so that I can assess preservation risks
and plan which tools will be required for access to
those formats.
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
3
Tools at Hand
• Apache Tika
• DROID from The National Archive
• (libmagic)
• Not a word on FITS...
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
4
Nanite
• Created and maintained by the British Library
• Improved by SCAPE and sustained by Open
Planets Foundation
• Tika and libmagic support added
• Advanced Tika support through a ”persistent” Tika server
• ARC header extraction added
• More to come…
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
5
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
6
References
• SCAPE User Story for web archive data: http://wiki.opflabs.org/display/SP/File+Format+Identification+and+Ch
aracterisation+of+Web+Archives
• Nanite: https://github.com/openplanets/nanite
• A Weekend With Nanite blog post:
http://openplanetsfoundation.org/blogs/2014-05-28weekend-nanite
• Open Planets Blogs:
http://openplanetsfoundation.org/blog
This work was partially supported by the SCAPE Project.
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
7