Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Image Cataloging Tool Recommendations Digital Library Program Indiana University Annette Richmond August 2006 Through the examination of a variety of commercially available pieces of image management software (complete list in Appendix A), the following report details the findings and recommendations for a generic image database to be constructed by the Indiana University Digital Library Program (DLP). Image Manipulation Nearly all products provide tools to modify images within the software. Tools include red eye removal, sharpening, cropping, color adjustment, etc. These functions are provided in nearly all products intended for personal home use but are not as widely available in products intended for commercial application. Such tools seem beyond the scope of the image database to be created by the DLP. The majority of the products also allow for the conversion of images between file formats. For instance, many programs will automatically create webappropriate JPEG files from the originals. Similarly, many automatically create thumbnail-sized versions of all images and/or allow the creation and storage of multiple sizes of each image. Each piece of software promotes the large number of formats that it supports. However, most do not allow modification of all formats, but merely allow the myriad of formats to be imported. The software then allows the user to modify the images and save them in a more common file format. Some amount of this functionality seems applicable to the DLP database. I would make sense to allow a user to keep multiple versions (sizes) of an image, possibly in multiple file formats. It might also be useful to allow users to convert a few common image formats into JPEG files. It does not seem necessary to support a wide range of file formats as only a few are likely to be widely used. Metadata Fields The various pieces of software studied included a variety of metadata fields. Some of the most common include the title, caption/description/comments, file name, keywords or categories, author or creator, date the photo was taken, date the photo was imported into the system, and rating. Additionally, the systems intended for business use also included a field or fields to track the date and purpose of each use in addition to copyright status or other legal information. The image title usually defaults to the file name until a new value is supplied by the user. Some systems only track the title and let the database manage which title goes with which file without burdening the user with that information. While such a practice may seem user-friendly, it seems like the database constructed by the DLP should include both pieces of information and let the user see both pieces as well. Also, the system may need to track multiple file names if users are allowed to store multiple versions (sizes, file formats, etc) of an image. The combination of caption, description, and comments varies from database to database. Some systems have both a caption and description, with the caption being short and the description being longer. The comments fields might then be used to store comments submitted by other users of the system. So the caption and/or description would be added by the cataloger and the comment(s) would be added by other users as they viewed the images and desired to offer feedback to the creator and/or cataloger of the image. For the purposes of the DLP tool, the comments field seems superfluous while the caption and/or description fields seem practical. If images have meaningful names and adequate subject descriptors, it may be sufficient to have either a caption or a description, but not both. Most systems seem to keep only keywords or categories, as they serve similar organizational purposes. Some, which employ a hierarchical structure use both as an image can only belong to one category (only be placed in one folder) but the user may desire to assign multiple keywords. The simple solution is to assign keywords (or subject descriptors or categories) to images without regard for how the images are physically organized on the storage medium. Thus, images could be assigned multiple keywords. In this way the keywords would function much like subject descriptors in an OPAC, though the commercially available systems do no appear to differentiate between types of keywords (geographic, personal name, etc). Most systems appear to hyperlink keywords, allowing the user to quickly jump to a list of all images that have a certain keyword (as is frequently done with subject descriptors in an OPAC). However, there do not appear to be any tools to encourage the use of a controlled vocabulary. A user might choose to do so, but it would be purely voluntary and the system would not assist the user by providing a list of previously used terms or suggested terms. Perhaps it is unrealistic to think that all users of the cataloging tool to be created by the DLP would desire to use a controlled vocabulary to create keywords for their images. However, it would be nice if the cataloging tool would automatically suggest previously-used keywords when a cataloger begins to type to facilitate the consistent use of keywords. Either way, it would be extremely convenient for keywords to be hyperlinked from within an image to take the user to a list of all images with that keyword. Only a few systems included a field for the author or creator. It might be useful to track the photographer for each image. However, this information would really only be helpful if the photographer still held the copyright of the image. However, it departments are going to use these images in publications, perhaps they should keep track of the photographer so that proper credit can be given in the byline of the photo. The date the photograph was taken is often obtained from the information that is automatically stored in the image by a digital camera. This data will also include the exact time at which the photograph was taken. It may be unnecessary to keep information that detailed, but departments will wish to maintain dates for their images. It is useful when the system can automatically upload that information from a digital camera, but users must also be able to enter the information manually if they are scanning an image, or edit the information from a digital camera in case the date information on the camera was incorrectly set. The date a photograph was imported into the database is probably less useful to academic departments. Unless the information is kept to help the department track when the uploading work was done, there seems to be little use for this data. Generally, when search for images users will wish to search by the date the photograph was taken. Many of the system intended for personal use allow the user to rate photographs, generally using stars. Picasa allows a user to star their favorite photos while some other systems allow photographs to be rated from one to five stars. This type of data does not seem important for the software to be created by the DLP. At least one of the systems intended for business use includes fields for tracking the date and purpose of each use of the photograph. This information could be extremely useful to academic departments. Users may wish to ensure that the same photograph is not used frequently, or in certain different types of publications. Keeping this information could also potentially allow a user to easily produce a list of all images that were in a particular publication. This data would likely need at least two repeatable fields, or there may be another, better, design. A field for copyright information appeared in all systems intended for business use and some that are intended for personal use. This data would be extremely important for academic departments that might wish to publish their images. The database should keep both the date of the copyright and the name of the copyright holder. Metadata Standards The following three metadata standards were mentioned in one or more of the systems studied: EXIF: Exchangeable Image File Format, stores data about the camera and its settings when the photograph was taken, see http://en.wikipedia.org/wiki/Exchangeable_image_file_format IPTC: International Press Telecommunications Council, IPTC headers imbed metadata in the image file itself (JPEG and TIFF formats only), see http://en.wikipedia.org/wiki/IPTC XMP: eXtensible Metadata Platform, created and controlled by Adobe, allows the imbedding of metadata in image files, see http://en.wikipedia.org/wiki/Extensible_Metadata_Platform EXIF is the most commonly supported metadata standard, however this standard stores only data about the image as it was created by a digital camera. This data does include the date the photograph was taken, but does not allow for the inclusion of keywords or other descriptive metadata. IPTC is also supported in about half of the systems studied. Many speak of the ability to import IPTC metadata and place it into the system’s database. Only one or two actually mentioned that they could also write IPTC metadata back to the image file so that the data would travel with the image when the image was exported from the system. It was unclear if other systems also had this capability and simply did not mention it. Through some experimentation with Picasa and PicaJet, it became clear that not all systems that claim to support IPTC metadata, support it equally. It also seems that the systems import the IPTC metadata may only do so when the image is imported for the first time and may not update their database if the image is modified using another piece of software. Photoshop provides tools to view and embed some IPTC metadata in image files. XMP is only supported by one or two systems. It is a proprietary standard, under the control of Adobe, so probably not a good choice for use in a digital library. I found no references to any of the metadata standards common in the digital library world. Exporting Metadata The majority of the commercially available systems studied appear to have some sort of proprietary database and provide no discussion of how one might export the data into a different database. The systems seem to focus on how the images might be exported, but never mention how the data might be exported. Discussions of metadata are primarily concerned with all the different ways a user might get data into the database. PicaJet does allow metadata to be exported in XML for one or multiple images at a time. The resulting XML document is difficult to read (no carriage returns or indention to facilitate human reading) but does appear to include all of the metadata in tags whose names are somewhat meaningful. As mentioned above, several of the systems claim that they support IPTC metadata. Some even claim that they could write IPTC metadata in addition to reading it. ACDSee, Cumulus, ImageFolio, PicaJet, Picasa, and ThumbsPlus all claim some ability to write IPTC metadata. Through experimentation, I found that Picasa writes captions into an IPTC metadata field, but does not appear to allow access to other fields. PicaJet appears to allow the user to modify all fields, though it attempts to pre-populate some of the fields with unhelpful data which the user cannot delete. I was not able to experiment with ACDSee, Cumulus, ImageFolio, and ThumbsPlus. In addition to supporting IPTC metadata, ThumbsPlus is based on either an Access database or some other SQL database. Therefore, there should be some way of extracting the data from that database and placing it in another. PAX-it is a non-proprietary relational database structure that is ODBC (Open Database Connectivity) compliant. I know nothing about this standard but I assume that data could be exported in some sort of standardized format. Additionally, PAX-it can be used with Oracle, SQL Server, and Microsoft Access databases, each of which should provide a way of exporting the data. The flickr database appears to exist entirely online. The promotional materials mentioned no method for downloading the data for ones images to a local computer. iPhoto makes no mention of exporting the data in any format or to any application. Batch Processing A large number of systems claim to allow for some sort of batch processing of metadata, though none provide details. For many, it is as simple as allowing a user to assign a single keyword to multiple images at once. At the other extreme, Cumulus (one of the business-oriented systems) allows for the creation of templates to pre-populate certain fields for new images based on a variety of parameters. Some products promoted that image manipulation could be done in batches (sharpening, cropping, etc). Others would convert between file formats in batches, or create web-deliverable JPEG images in batches. Some products simply touted their abilities to do batch processing without any indication of what sort of tasks might be accomplished. Some batch processing of metadata and/or batch creation of certain sizes and formats of files could be useful in the DLP image database. However, these features are pushing the scope of an image management database and moving into the realm of a system that allows the modification of images. Search and Display Each piece of software studied provides some tools for searching. Many promote the ease and efficiency of their search tools. All systems provide a basic keyword search. This search usually searches a default set of fields for the term(s) entered. Some systems with a larger variety of metadata fields allow the user to choose which fields are searched in this default search. Additionally, most systems provide some sort of advanced searching features. Some utilize the power of the database, allowing the user to search for specific terms in specific fields and construct Boolean searches (term A in field B and term C in field D but not term E in field F). These searches will also find images with a certain date or that fall into a certain date range. Generally, the most advanced searching options are provided by the tools intended for business use. The tools intended for personal use often have fewer searching features because there are fewer metadata fields. Within the advanced searching features, most tools also allow the user to sort the results. Some use relevance ranking, but most searches are sorted into date order by default. For systems with a hierarchical organization structure of images, search results might also be sorted into the categories of the storage structure. For the image database to be created by the DLP, it seems reasonable to provide both basic and advanced searching. Users will want to search and sort by date, as well as by copyright status, date and purpose of last use, and subject. As with many image databases, users first view a screen of thumbnail images with little to no metadata. Upon clicking on an image, they are taken to a page with a larger image and additional metadata. Such a page might also provide links to additional sizes of an image. It is also convenient to hyperlink some metadata fields from this page, enabling the user to quickly jump to other images with the same metadata values in certain fields. Some systems also provide a link on this page allowing the user to download the image. This seems an appropriate model to follow for the system to be designed by the DLP. Security Each of the image databases that were designed for business use provides some security features. Generally, these involve placing users into groups (with logins) and limiting what features of the database they can and cannot use. Many departments may have only one or two users of the image database. It certainly makes sense that departments might wish to password-protect their database so that only certain staff can make edits or access the images. It is possible that having multiple logins is beyond the scope of the proposed system. Additionally, at least one system was intended for displaying the catalog of images on the web, but not allowing users to view or download large images without paying. This system also suppresses right-clicking and the browser’s toolbar to prevent un-authorized downloading of images. Images might also include a watermark until the user had paid the appropriate fee for use. These features would be extremely useful if the DLP system is intended for displaying images to the web for sale. However, adding these features would be nearly as much trouble as adding e-commerce features to the database. Therefore, these features seem to be beyond the scope of the proposed database. Image Acquisition The pieces of software studied largely promote the large variety of digital cameras that they are compatible with. Very few mention that a user might wish to import images from a scanner, though all allow importing images from anywhere on the hard drive on the computer. The system to be designed by the DLP may wish to provide features to import images from various sources, or it might be enough to simply import images from the local computer or network destination. Then, the user would first need to use Photoshop or some other software to place the image on the computer or server, and then import the image into the database. This method would be less complicated to design for the database, since images would only be imported from one type of source (computer), instead of multiple types (scanner, camera, etc). All of the systems studied promote the wide range of sources from which metadata can be imported. It certainly would be useful to create a database that automatically imports any EXIF or IPTC metadata contained in the image file. Additionally, it would be useful to import metadata in XML or other common formats. Primarily, however, users will need to enter metadata for each new image. So, an interface needs to be provided to assist users. Even though systems promoted their effective and efficient searching, the two systems that I downloaded and tried did not make it intuitive to add additional metadata beyond what was imported from the image file or folder structure on my computer. Great searching features will only work if there is good metadata. So, even though I only found this feature in a couple of the systems intended for business use, I recommend that the DLP system have an interface specifically designed for cataloging new images. Such an interface would provide boxes to enter metadata for all the fields. This interface should also be accessible later to add or change information as necessary. Many of the personal use systems had separate buttons to add different types of metadata (e.g. click here to add label, click here to add caption, etc). This is confusing; especially if it is not clear what the different types of data are intended to do. If all metadata fields are displayed in one interface, perhaps with additional explanatory notes or links to help, it will be easier for the user to correctly catalog the image in an efficient manner. The pieces of software studied for this report varied widely in their capabilities and intended uses. Their websites also varied widely in the amount of information that was provided. Some were extremely detailed, providing many ideas for the proposed system while others were so vague that it was nearly impossible to tell what one would be getting if they chose to purchase the product. None seemed to be quite intended for the purpose suggested by the proposed DLP system, so not all features included in the commercially-available systems are appropriate for the new system. This report is intended to summarize what currently exists and provide some recommendations for which features are appropriate for the new, proposed tool to be created by the DLP. Appendix A The following websites for commercially available products were consulted in the compilation of this report. ACDSee: http://www.acdsee.com/products/acdsee/ Cumulus: http://www.canto.com/ flickr: http://www.flickr.com/ Image Portal: http://www.netx.net/image_management_software.jsp ImageFolio: http://www.imagefolio.com/ iPhoto: http://www.apple.com/ilife/iphoto/ PAX-it: http://www.paxit.com/image_management_software/image_management_softwa re.asp PicaJet: http://www.picajet.com/en/index.php Picasa: http://picasa.google.com/ Picolo: http://www.ekonnect.com/ ThumbsPlus: http://www.cerious.com/thumbnails.shtml