Download Image Management Software / Image Cataloging Tool

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Business intelligence wikipedia , lookup

Metadata wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Image Cataloging Tool
Recommendations
Digital Library Program
Indiana University
Annette Richmond
August 2006
Through the examination of a variety of commercially available pieces of
image management software (complete list in Appendix A), the following report
details the findings and recommendations for a generic image database to be
constructed by the Indiana University Digital Library Program (DLP).
Image Manipulation
Nearly all products provide tools to modify images within the software. Tools
include red eye removal, sharpening, cropping, color adjustment, etc. These
functions are provided in nearly all products intended for personal home use but are
not as widely available in products intended for commercial application. Such tools
seem beyond the scope of the image database to be created by the DLP.
The majority of the products also allow for the conversion of images between
file formats. For instance, many programs will automatically create webappropriate JPEG files from the originals. Similarly, many automatically create
thumbnail-sized versions of all images and/or allow the creation and storage of
multiple sizes of each image. Each piece of software promotes the large number of
formats that it supports. However, most do not allow modification of all formats,
but merely allow the myriad of formats to be imported. The software then allows
the user to modify the images and save them in a more common file format. Some
amount of this functionality seems applicable to the DLP database. I would make
sense to allow a user to keep multiple versions (sizes) of an image, possibly in
multiple file formats. It might also be useful to allow users to convert a few
common image formats into JPEG files. It does not seem necessary to support a
wide range of file formats as only a few are likely to be widely used.
Metadata Fields
The various pieces of software studied included a variety of metadata fields.
Some of the most common include the title, caption/description/comments, file
name, keywords or categories, author or creator, date the photo was taken, date the
photo was imported into the system, and rating. Additionally, the systems intended
for business use also included a field or fields to track the date and purpose of each
use in addition to copyright status or other legal information.
The image title usually defaults to the file name until a new value is supplied
by the user. Some systems only track the title and let the database manage which
title goes with which file without burdening the user with that information. While
such a practice may seem user-friendly, it seems like the database constructed by
the DLP should include both pieces of information and let the user see both pieces
as well. Also, the system may need to track multiple file names if users are allowed
to store multiple versions (sizes, file formats, etc) of an image.
The combination of caption, description, and comments varies from database
to database. Some systems have both a caption and description, with the caption
being short and the description being longer. The comments fields might then be
used to store comments submitted by other users of the system. So the caption
and/or description would be added by the cataloger and the comment(s) would be
added by other users as they viewed the images and desired to offer feedback to the
creator and/or cataloger of the image. For the purposes of the DLP tool, the
comments field seems superfluous while the caption and/or description fields seem
practical. If images have meaningful names and adequate subject descriptors, it
may be sufficient to have either a caption or a description, but not both.
Most systems seem to keep only keywords or categories, as they serve similar
organizational purposes. Some, which employ a hierarchical structure use both as
an image can only belong to one category (only be placed in one folder) but the user
may desire to assign multiple keywords. The simple solution is to assign keywords
(or subject descriptors or categories) to images without regard for how the images
are physically organized on the storage medium. Thus, images could be assigned
multiple keywords. In this way the keywords would function much like subject
descriptors in an OPAC, though the commercially available systems do no appear to
differentiate between types of keywords (geographic, personal name, etc). Most
systems appear to hyperlink keywords, allowing the user to quickly jump to a list of
all images that have a certain keyword (as is frequently done with subject
descriptors in an OPAC). However, there do not appear to be any tools to encourage
the use of a controlled vocabulary. A user might choose to do so, but it would be
purely voluntary and the system would not assist the user by providing a list of
previously used terms or suggested terms. Perhaps it is unrealistic to think that all
users of the cataloging tool to be created by the DLP would desire to use a controlled
vocabulary to create keywords for their images. However, it would be nice if the
cataloging tool would automatically suggest previously-used keywords when a
cataloger begins to type to facilitate the consistent use of keywords. Either way, it
would be extremely convenient for keywords to be hyperlinked from within an
image to take the user to a list of all images with that keyword.
Only a few systems included a field for the author or creator. It might be
useful to track the photographer for each image. However, this information would
really only be helpful if the photographer still held the copyright of the image.
However, it departments are going to use these images in publications, perhaps
they should keep track of the photographer so that proper credit can be given in the
byline of the photo.
The date the photograph was taken is often obtained from the information
that is automatically stored in the image by a digital camera. This data will also
include the exact time at which the photograph was taken. It may be unnecessary
to keep information that detailed, but departments will wish to maintain dates for
their images. It is useful when the system can automatically upload that
information from a digital camera, but users must also be able to enter the
information manually if they are scanning an image, or edit the information from a
digital camera in case the date information on the camera was incorrectly set.
The date a photograph was imported into the database is probably less useful
to academic departments. Unless the information is kept to help the department
track when the uploading work was done, there seems to be little use for this data.
Generally, when search for images users will wish to search by the date the
photograph was taken.
Many of the system intended for personal use allow the user to rate
photographs, generally using stars. Picasa allows a user to star their favorite
photos while some other systems allow photographs to be rated from one to five
stars. This type of data does not seem important for the software to be created by
the DLP.
At least one of the systems intended for business use includes fields for
tracking the date and purpose of each use of the photograph. This information
could be extremely useful to academic departments. Users may wish to ensure that
the same photograph is not used frequently, or in certain different types of
publications. Keeping this information could also potentially allow a user to easily
produce a list of all images that were in a particular publication. This data would
likely need at least two repeatable fields, or there may be another, better, design.
A field for copyright information appeared in all systems intended for
business use and some that are intended for personal use. This data would be
extremely important for academic departments that might wish to publish their
images. The database should keep both the date of the copyright and the name of
the copyright holder.
Metadata Standards
The following three metadata standards were mentioned in one or more of
the systems studied:



EXIF: Exchangeable Image File Format, stores data about the camera and its
settings when the photograph was taken, see
http://en.wikipedia.org/wiki/Exchangeable_image_file_format
IPTC: International Press Telecommunications Council, IPTC headers imbed
metadata in the image file itself (JPEG and TIFF formats only), see
http://en.wikipedia.org/wiki/IPTC
XMP: eXtensible Metadata Platform, created and controlled by Adobe, allows
the imbedding of metadata in image files, see
http://en.wikipedia.org/wiki/Extensible_Metadata_Platform
EXIF is the most commonly supported metadata standard, however this
standard stores only data about the image as it was created by a digital camera.
This data does include the date the photograph was taken, but does not allow for
the inclusion of keywords or other descriptive metadata.
IPTC is also supported in about half of the systems studied. Many speak of
the ability to import IPTC metadata and place it into the system’s database. Only
one or two actually mentioned that they could also write IPTC metadata back to the
image file so that the data would travel with the image when the image was
exported from the system. It was unclear if other systems also had this capability
and simply did not mention it. Through some experimentation with Picasa and
PicaJet, it became clear that not all systems that claim to support IPTC metadata,
support it equally. It also seems that the systems import the IPTC metadata may
only do so when the image is imported for the first time and may not update their
database if the image is modified using another piece of software. Photoshop
provides tools to view and embed some IPTC metadata in image files.
XMP is only supported by one or two systems. It is a proprietary standard,
under the control of Adobe, so probably not a good choice for use in a digital library.
I found no references to any of the metadata standards common in the digital
library world.
Exporting Metadata
The majority of the commercially available systems studied appear to have
some sort of proprietary database and provide no discussion of how one might
export the data into a different database. The systems seem to focus on how the
images might be exported, but never mention how the data might be exported.
Discussions of metadata are primarily concerned with all the different ways a user
might get data into the database.
PicaJet does allow metadata to be exported in XML for one or multiple
images at a time. The resulting XML document is difficult to read (no carriage
returns or indention to facilitate human reading) but does appear to include all of
the metadata in tags whose names are somewhat meaningful.
As mentioned above, several of the systems claim that they support IPTC
metadata. Some even claim that they could write IPTC metadata in addition to
reading it. ACDSee, Cumulus, ImageFolio, PicaJet, Picasa, and ThumbsPlus all
claim some ability to write IPTC metadata. Through experimentation, I found that
Picasa writes captions into an IPTC metadata field, but does not appear to allow
access to other fields. PicaJet appears to allow the user to modify all fields, though
it attempts to pre-populate some of the fields with unhelpful data which the user
cannot delete. I was not able to experiment with ACDSee, Cumulus, ImageFolio,
and ThumbsPlus.
In addition to supporting IPTC metadata, ThumbsPlus is based on either an
Access database or some other SQL database. Therefore, there should be some way
of extracting the data from that database and placing it in another.
PAX-it is a non-proprietary relational database structure that is ODBC
(Open Database Connectivity) compliant. I know nothing about this standard but I
assume that data could be exported in some sort of standardized format.
Additionally, PAX-it can be used with Oracle, SQL Server, and Microsoft Access
databases, each of which should provide a way of exporting the data.
The flickr database appears to exist entirely online. The promotional
materials mentioned no method for downloading the data for ones images to a local
computer.
iPhoto makes no mention of exporting the data in any format or to any
application.
Batch Processing
A large number of systems claim to allow for some sort of batch processing of
metadata, though none provide details. For many, it is as simple as allowing a user
to assign a single keyword to multiple images at once. At the other extreme,
Cumulus (one of the business-oriented systems) allows for the creation of templates
to pre-populate certain fields for new images based on a variety of parameters.
Some products promoted that image manipulation could be done in batches
(sharpening, cropping, etc). Others would convert between file formats in batches,
or create web-deliverable JPEG images in batches. Some products simply touted
their abilities to do batch processing without any indication of what sort of tasks
might be accomplished. Some batch processing of metadata and/or batch creation of
certain sizes and formats of files could be useful in the DLP image database.
However, these features are pushing the scope of an image management database
and moving into the realm of a system that allows the modification of images.
Search and Display
Each piece of software studied provides some tools for searching. Many
promote the ease and efficiency of their search tools.
All systems provide a basic keyword search. This search usually searches a
default set of fields for the term(s) entered. Some systems with a larger variety of
metadata fields allow the user to choose which fields are searched in this default
search.
Additionally, most systems provide some sort of advanced searching features.
Some utilize the power of the database, allowing the user to search for specific
terms in specific fields and construct Boolean searches (term A in field B and term
C in field D but not term E in field F). These searches will also find images with a
certain date or that fall into a certain date range. Generally, the most advanced
searching options are provided by the tools intended for business use. The tools
intended for personal use often have fewer searching features because there are
fewer metadata fields.
Within the advanced searching features, most tools also allow the user to sort
the results. Some use relevance ranking, but most searches are sorted into date
order by default. For systems with a hierarchical organization structure of images,
search results might also be sorted into the categories of the storage structure.
For the image database to be created by the DLP, it seems reasonable to
provide both basic and advanced searching. Users will want to search and sort by
date, as well as by copyright status, date and purpose of last use, and subject.
As with many image databases, users first view a screen of thumbnail images
with little to no metadata. Upon clicking on an image, they are taken to a page
with a larger image and additional metadata. Such a page might also provide links
to additional sizes of an image. It is also convenient to hyperlink some metadata
fields from this page, enabling the user to quickly jump to other images with the
same metadata values in certain fields. Some systems also provide a link on this
page allowing the user to download the image. This seems an appropriate model to
follow for the system to be designed by the DLP.
Security
Each of the image databases that were designed for business use provides
some security features. Generally, these involve placing users into groups (with
logins) and limiting what features of the database they can and cannot use. Many
departments may have only one or two users of the image database. It certainly
makes sense that departments might wish to password-protect their database so
that only certain staff can make edits or access the images. It is possible that
having multiple logins is beyond the scope of the proposed system.
Additionally, at least one system was intended for displaying the catalog of
images on the web, but not allowing users to view or download large images without
paying. This system also suppresses right-clicking and the browser’s toolbar to
prevent un-authorized downloading of images. Images might also include a
watermark until the user had paid the appropriate fee for use. These features
would be extremely useful if the DLP system is intended for displaying images to
the web for sale. However, adding these features would be nearly as much trouble
as adding e-commerce features to the database. Therefore, these features seem to
be beyond the scope of the proposed database.
Image Acquisition
The pieces of software studied largely promote the large variety of digital
cameras that they are compatible with. Very few mention that a user might wish to
import images from a scanner, though all allow importing images from anywhere on
the hard drive on the computer. The system to be designed by the DLP may wish to
provide features to import images from various sources, or it might be enough to
simply import images from the local computer or network destination. Then, the
user would first need to use Photoshop or some other software to place the image on
the computer or server, and then import the image into the database. This method
would be less complicated to design for the database, since images would only be
imported from one type of source (computer), instead of multiple types (scanner,
camera, etc).
All of the systems studied promote the wide range of sources from which
metadata can be imported. It certainly would be useful to create a database that
automatically imports any EXIF or IPTC metadata contained in the image file.
Additionally, it would be useful to import metadata in XML or other common
formats. Primarily, however, users will need to enter metadata for each new image.
So, an interface needs to be provided to assist users. Even though systems
promoted their effective and efficient searching, the two systems that I downloaded
and tried did not make it intuitive to add additional metadata beyond what was
imported from the image file or folder structure on my computer. Great searching
features will only work if there is good metadata. So, even though I only found this
feature in a couple of the systems intended for business use, I recommend that the
DLP system have an interface specifically designed for cataloging new images.
Such an interface would provide boxes to enter metadata for all the fields. This
interface should also be accessible later to add or change information as necessary.
Many of the personal use systems had separate buttons to add different types of
metadata (e.g. click here to add label, click here to add caption, etc). This is
confusing; especially if it is not clear what the different types of data are intended to
do. If all metadata fields are displayed in one interface, perhaps with additional
explanatory notes or links to help, it will be easier for the user to correctly catalog
the image in an efficient manner.
The pieces of software studied for this report varied widely in their
capabilities and intended uses. Their websites also varied widely in the amount of
information that was provided. Some were extremely detailed, providing many
ideas for the proposed system while others were so vague that it was nearly
impossible to tell what one would be getting if they chose to purchase the product.
None seemed to be quite intended for the purpose suggested by the proposed DLP
system, so not all features included in the commercially-available systems are
appropriate for the new system. This report is intended to summarize what
currently exists and provide some recommendations for which features are
appropriate for the new, proposed tool to be created by the DLP.
Appendix A
The following websites for commercially available products were consulted in the
compilation of this report.
ACDSee: http://www.acdsee.com/products/acdsee/
Cumulus: http://www.canto.com/
flickr: http://www.flickr.com/
Image Portal: http://www.netx.net/image_management_software.jsp
ImageFolio: http://www.imagefolio.com/
iPhoto: http://www.apple.com/ilife/iphoto/
PAX-it:
http://www.paxit.com/image_management_software/image_management_softwa
re.asp
PicaJet: http://www.picajet.com/en/index.php
Picasa: http://picasa.google.com/
Picolo: http://www.ekonnect.com/
ThumbsPlus: http://www.cerious.com/thumbnails.shtml