Download MODS capture by Zotero (ASHO)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MODS capture by Zotero as observed in
American Social History Online
http://www.dlfaquifer.org/home
Preliminary notes for Aquifer Metadata Working Group
Laura Akerman, 2008-10-29, revised by Laura 2009-03-25
[email protected]
Issues arranged by Zotero field name:
creator: #4, #5
creatorTypes: #6
identifier: #2
itemType: #1, #9 #14, #15, #16
language: #3
place: #7
publisher: #8
title: #13
Tags tab: #10
no Zotero field mapping; don't we need one? #11, #12
Analysis arranged by MODS element; issues are numbered sequentially
abstract
Mapping appears OK. MODS abstract maps to Zotero abstractNote. (NOTE that this
kind of note appears as a field in the Info tab of Zotero, whereas other types of notes such
as in the MODS "note" field, are "pushed" to the Notes tab.).
accessCondition
Mapping appears OK. MODS accessCondition of any type, maps to Zotero rights
element.
classification
Mapping appears OK. MODS classification element is mapped to a Zotero callNumber
element.
extension
Mapping appears OK; no Zotero mappings for extension were found, or expected.
genre
Issue #1. MODS genre is only mapped if it matches one of the Zotero itemType types.
There are many more types of form/genre terms used in that element, more detailed and
from different angles (literary or film genres such as "cartoons" "mystery stories" etc. or
physical genres such as "stereographs"). If Zotero could add this field to all itemType
field sets, that would be lovely, if all genre terms could be captured there. If it can't,
could we consider mapping these terms to the Tags tab, along with subjects? (Can Zotero
differentiate kinds of Tags - for subject and genre?)
Examples of Aquifer MODS records containing a <genre> element:
title:
Journal of a voyage across the Atlantic: with notes on Canada & the United States, and
return to Great Britain in 1844 (genre - Biography)
title:
Every girl pulling for victory : Victory Girls : United War Work Campaign (genre Posters)
identifier
Issue #2. Cannot fully assess processing of the MODS identifier element to a Zotero
identifier field (?) from the translator code; it apparently calls other code not present in
the translator (processIdentifiers). It would be helpful to know how that code operates,
then we can determine if there are any issues.
Example Aquifer MODS records with identifier elements:
title:
Every girl pulling for victory : Victory Girls : United War Work Campaign (genre Posters) -- <identifier>msp00003</identifier>
title:
Wild wild women
<identifier type="local" displayLabel="Call number">091074</identifier>
Note: in Zotero, or when exported to either Zotero RDF or MODS, no identifier field
appears.
language
Issue #3. No mapping of MODS language/languageTerm (with type attribute of either
term or code) could be found. This is puzzling because Zotero has a "language" item
field.
Example Aquifer MODS records with language elements:
title: Cortés Nos Chingó In A Big Way The Hüey (has two elements for English and
Spanish)
title: A travers la somme devastee : le cimetiere Allemand de Nontecourt, dans le fond, o
droite, vue des ruines du village de Nontecourt, o gauche, des ruines du bourg de
Monchy-la-Gache (French)
location
Mapping appears OK.
MODS location/physicalLocation is mapped to Zotero archiveLocation element.
MODS location/url is mapped to Zotero url element.
name
Issue #4. Some names are not coming through in the Zotero metadata record; only the
date at the end of the name string appears. NACO Authority File names are often
qualified by date.
Example: captured MODS for "Famous actresses of the day in America" from Aquifer,
shows
Author: 1869-1935 ,
(first)
The MODS record has
<name type="personal">
<namePart>Strang, Lewis Clinton</namePart>
<namePart type="date">1869-1935</namePart>
<role>
<roleTerm authority="marcrelator" type="text">creator</roleTerm>
</role>
</name>
It appears that the code is dealing appropriately with MODS name/namePart elements
that that have "family" or "given" attributes (mapping them to Zotero creator.lastName
and creator.firstName elements), but then assumes that any other namePart subelement
be stored in variable "backup name" and run through the Zotero "cleanAuthor" utility.
This is probably designed for namePart elements with no attribute which (if in AACR2
form), have a form Lastname, Firstname M. I.
However, there are two other defined MODS namePart type attributes that are not dealt
with: @type=date and @type=termsOfAddress. These need to be either specifically
ignored, or mapped in to the end of the name somehow (if this is possible in Zotero?).
The results make it appear that the Zotero translator is processing them through "clean
author" as if they were a name.
Issue #5. There is a comment in the code area where Zotero "creator" field is mapped: "//
TODO: institutional authors". Please follow through on this. Right now, MODS name
elements with type "corporate" or "conference" are showing up in Zotero looking like
this:
Author:
United States,
(first)
Where the MODS record has:
<name type="corporate">
<namePart>United States</namePart>
<role>
<roleTerm authority="marcrelator" type="text">creator</roleTerm>
</role>
</name>
Corporate bodies don't have a first name, so the (first) need not display.
Example Aquifer records containing corporate body names include:
Title: Olympic Boulevard, State Route 173, looking east from point 200 feet west of
Irolo Street, Los Angeles County, 1940
Title: Organization and historical sketch of the Women's Anthropological Society of
America
Issue #6. Only 3 terms, when found in MODS name/role/roleTerm, are mapped to
Zotero's "creatorTypes", and these are only mapped if the "code" form of term is used.
Many more mappings are possible. The current mapping handles code "edt" mapped to
"editor", "ctb" mapped to "contributor", and "trl" mapped to "translator".
This MODS roleTerm element can contain either a code or a term (governed by the
"type" attribute), and a standard vocabulary used is the MARC Relators code list
(http://www.loc.gov/marc/relators/relators.html) referred to in the MODS documentation.
Below are examples of more mappings that could be made from this list. Zotero team
may wish to review the definitions in the MARC documentation to see if they are in
harmony with Zotero definitions or if more mappings could be made.
MARC term:
map to Zotero:
Editor
creatorTypes.editor
MARC term:
map to Zotero:
Contributor
creatorTypes.contributor
MARC term:
map to Zotero:
Translator
creatorTypes.translator
MARC code:
or MARC term:
map to Zotero:
ive
Interviewee
creatorTypes.interviewee
MARC code:
or MARC term:
map to Zotero:
ivr
Interviewer
creatorTypes.interviewer
MARC code:
or MARC term:
map to Zotero:
drt
Director
creatorTypes.director
MARC code:
or MARC term:
map to Zotero:
aus
Author of screenplay, etc.
creatorTypes.scriptwriter
MARC code:
or MARC term:
map to Zotero:
pro
Producer
creatorTypes.producer
MARC code:
or MARC term:
map to Zotero:
act
Actor
creatorTypes.castMember
MARC code:
or MARC term:
map to Zotero:
spn
Sponsor
creatorTypes.sponsor
MARC code:
or MARC term:
map to Zotero:
inv
Inventor
creatorTypes.inventor
MARC code:
or MARC term:
map to Zotero:
rcp
Recipient
creatorTypes.recipient
MARC code:
or MARC term:
map to Zotero:
prf
Performer
creatorTypes.performer
MARC code:
or MARC term:
cmp
Composer
map to Zotero:
creatorTypes.composer
MARC code:
or MARC term:
map to Zotero:
lbt
Librettist
creatorTypes.wordsBy
MARC code:
or MARC term:
map to Zotero:
ctg
Cartographer
creatorTypes.cartographer
MARC code:
or MARC term:
map to Zotero:
prg
Programmer
creatorTypes.programmer
MARC code:
or MARC term:
map to Zotero:
art
Artist
creatorTypes.artist
MARC code:
or MARC term:
map to Zotero:
cmm
Commentator
creatorTypes.commenter
MARC code:
or MARC term:
map to Zotero:
cwt
Commentator for written text
creatorTypes.commenter
Example Aquifer records containing some of these terms:
Title: Map of city and county of San Francisco (Cartographer)
Title: Performance by Tito Vasconcelos (prf)
Title: White Eagle and Pura Fé sing Rudy Martin's songs (cmp, prf, as well as additional
codes not mentioned above (mus, lyr, voc)
note
Mapping appears OK. MODS note element is assigned to a variable and "pushed" to the
Zotero notes tab for this item.
originInfo
Mappings that appear OK:
MODS originInfo/edition subelement is mapped to Zotero edition field.
There are mappings from MODS originInfo subelements, either copyrightDate,
dateIssued, or dateCreated (in that order) to one Zotero date field.
MODS originInfo/dateModified is mapped to a Zotero lastModified element.
MODS originInfo/dateCaptured is mapped to a Zotero accessDate element.
Issue #7. MODS originInfo/place/placeTerm is mapped to Zotero place field, only if
type-"text". It could be possible to use a table with the MARC Code List for Countries
(http://www.loc.gov/marc/countries/) to lookup the text form, when only type="code" is
present here. For most MODS records, especially those mapped from MARC, a
type="text" form is likely to be present, so this may not be worth the effort.
Aquifer records where only a type="code" form of originInfo/place/placeTerm is present
in the MODS record:
Title: White Eagle and Pura Fé sing Rudy Martin's songs
Title: Biographical dictionary and portrait gallery of representative men of Chicago and
the World's Columbian Exposition
Issue #8. MODS originInfo/publisher. Not sure about this one - it appears that MODS
publisher maps to a Zotero "publisher" element, except when the Zotero itemType is
"website" or "webpage", in which case, it is mapped to Zotero "publicationTitle" !
Is this because the "publisher" field is not defined for the "webpage" set of elements? If
so, this is problematic from two fronts:
a. Items published on the web as webpages or websites can have publishers (entities
responsible for the webpages or sites).
b. See under "physicalDescription", the note about setting itemType to Zotero webpage
based on value "electronic" in the MODS physicalDescription/form element. This means
that, for example, digitized books could end up with an itemType of "webpage" and their
publisher would not be captured.
This appears to have been partially improved since the first draft of this analysis. For
records getting type "Website", the publisher element is now not appearing as publisher
(still!) but is not showing up as "Website title" either. MODS <relatedItem type="host">
appears to be mapped to "Website title" which makes more sense.
Aquifer records whose Zotero itemType comes through as "website" or "webpage",
which have an <originInfo><publisher> element:
Title: Studies on Inbreeding. Publisher is The Wistar Institute of Anatomy and Biology;
Title: The woman who wouldn't. Publisher is G.P. Putnam's Sons,
Note that publisher (in these cases, of the original item) does not show up anywhere in
Zotero record. Both of these items are actually digitized books.
part
Mapping to Zotero "volume" "issue" or "section" field: may be OK; not a complete
mapping. The code maps part detail elements that have type "volume" "issue" or
"section" to Zotero fields with the same name. It uses variables, first looking for
subelement part of the relatedItem element, then looking at part as a top-level element.
The code maps part/detail/number if it is present; otherwise it maps detail as text (Is this
possible?). It seems to ignore the part/detail/caption and part/detail/title subelements, as
well as part/detail "level" attribute
The Aquifer group may have more comment on this later if/when we have time and
examples of MODS records using "part" element, to test with.
Page(s): Seems OK. Maps start and ending pages to a Zotero "pages" element, separated
by a dash if start and end are different pages.
I have done a lot of hunting but have been unable to find Aquifer records using the toplevel MODS Part element. This is a newer MODS element and has apparently gotten
limited application in the Aquifer metadata collections.
physicalDescription
Issue #9. Zotero is using the MODS physicalDescription/form element with
@authority="marcform", where content is "electronic", to set the Zotero itemType
element to be "webpage".
This may have unintended effects, because almost anything from the web captured in
Zotero could get the designation "electronic" (particularly if it is a MODS record
converted from MARC, where this is mapped from a "fixed field code" that's widely used
for web resources of all types).
It would be better to omit this mapping.
Aquifer records containing <physicalDescription><form
authority="marcform">electronic</form></physicalDescription> which are getting
inappropriately mapped to "Web Page" instead of "Book" itemType:
Title: Studies on Inbreeding. Publisher is The Wistar Institute of Anatomy and Biology;
Title: The woman who wouldn't. Publisher is G.P. Putnam's Sons,
recordInfo
Mapping is OK. There is a mapping of content of recordInfo/recordContentSource to
Zotero's source field, and a mapping of recordInfo/recordIdentifier to Zotero's
accessionNumber field.
relatedItem
Mapping appears OK.
MODS relatedItem type="host" subelement title/titleInfo type="abbreviated" is mapped
to the Zotero journalAbbreviation element, and to the publicationTitle element if that has
not been mapped from other content yet. Since serials are generally the types of "hosts"
for which abbreviated titles are supplied, seems safe.
MODS relatedItem type="series" subelement titleInfo/title is mapped to Zotero's
series element; titleInfo/partTitle for a series is mapped to Zotero's seriesTitle element;
titleInfo/subtitle is mapped to Zotero's seriesText element; titleInfo/partNumber is
mapped to Zotero seriesNumber.
subject
Issue #10. Mapping of MODS subject subelements is missing a lot! Subject is "pushed"
to the Tags tab in Zotero. Only the MODS subject/topic subelements are mapped; this
leaves out many other types of subject or parts of subjects, which could be useful to
Zotero users (who wouldn't likely care about separation of the "subject facets").
Sublements for name, titleInfo, geographic, temporal, and occupation could be mapped
directly; geographicCode, hierarchicalGeographic, and cartographics might present more
difficulty to map and are less critical to use (usually records containing these elements
have other types of subject terms used for the same entities, that are more easily mapped).
The genre subelement under subject is a special case.
#10a. Would it be possible, based on attribute authority="lcsh" in the subject element, to
map all the subelements of such a MODS subject into one string, sequentially, with a
space, two dashes, and a space as a delimiter? (This is how LC subject headings are
intended to be viewed but may not fit with Zotero's functionality.)
#10b. Right now there seems to be no way to "group" or differentiate types of tags... is
anything in the offing? If we mix different kinds of subjects there (or subject plus other
kinds of descriptors such as genre), it makes it difficult to "map back out" the tags field to
MODS or other metadata formats that make these differences. But, that being said,
mapping subject/genre or MODS genre to the tags tab is an option.
If neither 10a or 10b is possible, it would be better to leave the subject/genre subelement
unmapped.
An example MODS subject, from Aquifer record "Washington and his comrades in
arms"
<subject authority="lcsh">
<geographic>United States</geographic>
<topic>History</topic>
<temporal>Revolution, 1775-1783</temporal>
</subject>
In "LCSH display form": United States -- History -- Revolution, 1775-1783.
The translator will only pick up "History" from this subject element. That's missing a lot.
Other Aquifer records with multifaceted subjects:
Title: Southern women in the recent educational movement in the South
(topic/temporal/geographic)
Title: A history of Williams College (corporate name/topic)
Title: The memorial life of General William Tecumseh Sherman (personal/topic,
geographic/topic/temporal
Some of the Aquifer records don't separate out the facets into separate subelements, but
just have a string with dashes. From an example title, Letter to Adelina from Juanita
Wolfskill:
<subject authority="lcsh">
<geographic>Orange County (Calif.)--History</geographic>
</subject>
Because this example is under the subelement <geographic> it doesn't get mapped to tags
in Zotero.
But if the <topic> subelement was used (which is how the MODS instructions say to
treat undifferentiated "subject headings"), it would have mapped, dashes and all, as in the
Aquifer examle record titled: Ruins of Prager's Department Store:
<subject>
<topic>Earthquakes--California--San Francisco--Photographs</topic>
</subject>
Zotero captures this and two other similar <topic> subjecs as Tags that look like this:
Earthquakes--California--San Francisco--Photographs
Note that this example does not identify the subject as an LC subject heading. It follows
the form of printed LCSH but doesn't have the subelement structure.
In summary, the options for Zotero seem to be:
1. Map all types of subject subelements (name, title, topic, chronological, geographic,
and form) separately as Tags.
2. Map multiple subelements of subjects having authority="lcsh" as a string, with each
subelement in the sequence separated from others by two dashes
3. Allow "typing" of Tags so that topic tags, name tags, geographic tags, time period tags
can be together (probably a "new feature).
.
tableOfContents
Issue #11. The translator does not appear to make use of this element at all. If there is a
Zotero "abstractNote" field, why not have a "contentsNote" field? Or, if the size of some
tables of contents might be an issue, could it at least get "pushed" to the Note tab?
An example of an Aquifer ASHO record with a TOC is the title "The people of the
Eastern Orthodox churches, the separated churches of the east, and other Slavs: report of
the Commission Appointed by the Missionary Department of New England to Consider
the Work of Co-operating with the Eastern Orthodox Churches, the Separated Churches
of the East, and Other Slavs"
targetAudience
Issue #12. There appears to be no Zotero field to map the content of targetAudience field
to. This is not a major issue, but if adding a Zotero field is not feasible, could this be a
separate kind of Tag, in which case the element could be mapped to the Tags tab?
Example Aquifer record with a targetAudience field:
titleInfo
Issue #13. It appears that the current code creates a Zotero newItem.title for each MODS
titleInfo/title, if the @type is not equal to "abbreviated". However, when there are
multiple titleInfo elements, only one seems to get picked to appear in the brief list and the
"title" box when viewing in the Zotero plugin. Zotero might even prefer a titleInfo
element with an attribute - but it shouldn't.
The titleInfo element without any type attribute is the actual title of the work and should
be the primary one to capture and display - and although MODS does not require it,
standard practice is to always have at least one "typeless" titleInfo element (and usually
only one). If there's a way to capture and use titles with type attributes (translated,
alternative, uniform as well as abbreviated) elsewhere, that would be nice, but one of
these should not get to be the "title" if a type-less titleInfo exists, just because the
translator encounters it in a certain order (last?). Please change the logic to prefer a
titleInfo element with no type attribute as the source for title.
An Aquifer record's multiple titleInfo elements:
<titleInfo>
<nonSort>The </nonSort>
<title>Constitution of the United States of America</title>
<subTitle>
as proposed by the Convention, held at Philadelphia, September 17, 1787, and since
ratified by the several states : with the several amendments thereto
</subTitle>
</titleInfo>
−
<titleInfo type="uniform">
<title>Constitution</title>
</titleInfo>
The only title showing up in Zotero is "Constitution".
typeOfResource
Issue #14. There is no Zotero type for "sheet music", so "book" is being used. We want
to request a "sheet music" type. A Metadata Working Group member has agreed to work
on the request for Zotero elements for this type; that information will be furnished later.
Aquifer records for sheet music may be found in several collections of sheet music
(Music for the Nation: American Sheet Music, 1820-1860, 1870-1885, Musical Scores,
and the Starr Sheet Music Collection.
Title: Who tied the can on the old dog's tail? Did you tie the can on the old dog's tail?
Title: If I should take a notion to jump into the ocean.
Issue #15 Zotero's mapping to its itemType element does not take MODS
typeOfResource values into account at all. These are more likely to be present than
genre elements and are usually required for many contexts, including Aquifer. While a
one-to-one mapping isn't possible for all Zotero types, the following could help provide
some better defaults than the all-purpose "book" if other data doesn't supply a different
type.




text
could map to Zot. itemTypes.book by default (could also be
periodical, newspaper, theses, letter, but those mappings would have to come
from genre)
cartographic could map to itemTypes.map by default
notated music Need an itemType for this! book by default...
sound recording - or,
o sound recording-musical
o





sound recording-nonmusical
all 3 could map to Zot. itemTypes.audioRecording
still image could map to itemTypes.artWork by default, although that's too
specific (since there's not a generic "image" itemType for Zotero).
moving image could map to itemTypes.videoRecording by default
three dimensional object (uh, probably won't encounter one of these on the
web, perhaps itemTypes.artWork would be a better guess than itemTypes.text).
software, multimedia could map to itemTypes.computerProgram
mixed material not sure there's a Web equivalent of this ("collections") and
probably won't get any but closest match is itemTypes.webpage
Example Aquifer records with typeOfResource:
text:
- Biographical sketches of the founder and principal alumni of the Log college (book;
captures as Book)
- Witness log (2 page handwritten document; captures as Book)
text:
- The New England home magazine (a serial: captures as Journal article)
cartographic:
- Peninsula between Delaware & Chesopeak Bays (a map; captures as Book)
notated music:
- Oh! You beautiful doll, you great, big beautiful doll! (sheet music, captures as Book)
sound recording-nonmusical:
- Title: Poetry reading and Creator: Frost, Robert, 1874-1963 (streaming audio, captures
as Book)
still image:
Flowering Currant at Boonville Mendocino county (digitized photograph, captures as
Book)
moving image:
- Visitin' 'round at Coolidge Corners (streaming video; captures as Film)
- Chavela Vargas en vivo en El Hábito (versión sin editar) (streaming video; captures as
Book)
Aquifer doesn't contain any true examples of software, multimedia right now, although
some records are mis-coded as that type (but are really electronic texts).
mixed material: is the MODS type for collections (such as archival collections).
- W. Stewart Evans Collection, 1967-1979
Other issues:
Isssue #16. There's a "TODO: thesis type" note in the code section that deals with
Zotero itemTypes. We feel it would be very useful to map to this Zotero type from
MODS if possible. Note that the MARC Genre Terms
(http://www.loc.gov/marc/sourcecode/genre/genrelist.html) which maps to various
MARC fixed field values, contains the term "thesis"; MODS records using that
vocabulary in the genre element will have an authority attribute "marcgt". This would be
one "hook" in a MODS record that could map to a "thesis" itemType; there may be other
possibilities.
Aquifer records for theses:
The development of Chicago and vicinity as a manufacturing center prior to 1880 (does
not contain a genre element with "thesis", but contains a "thesis note"; captured as Book)
The progress of the fire in San Francisco April 18th-21st, 1906 : as shown by an analysis
of original documents (contains a genre element with "academic dissertations" and a
"thesis note"; captured as Book)
Wasn't able to find an Aquifer example of use of genre element containing "thesis".
Presence of the word at the beginning of a note element may be a more likely marker at
present, expecially for MODS records mapped from MARC. Presence of "dissertation"
in the genre field might also be useful.