Download NASAThes_NKOS_053103

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Using the NASA Thesaurus to
Support the Indexing of
Streaming Media
Gail Hodge
Information International Associates, Inc.
Janet Ormes & Patrick Healey
NASA Goddard Space Flight Center Library
Historic Context
• The Library has collected and circulated the
Center’s colloquia on audio or video since 1967
• A catalog of these holdings have been posted on
the Library’s web site since 2001
• Patrons required to come to the Library, resulting
in limited accessibility of recorded colloquia
• Streaming Media Center Project began in 2001 as
part of the Library’s response to Knowledge
Management initiatives
Introducing the GSFC Media
Center
Streaming Media
• Streaming media
– Video that is encoded for delivery across the
internet/intranet
• Encoding
– Computer processing of video to a format for web
casting
• Web casting
– The act of delivering audio and video content across the
internet/intranet
– Can be delivered live or on-demand
The Goddard Library Streaming Media Center
• The Streaming Media Center is now available from
the Library website (http://library.gsfc.nasa.gov)
• Can be included in personalized portals
• Library has collected >350 hours of video
– >100 hours indexed
• Currently broadcasting 2 hours daily for the Earth
Observing Systems Knowledge Management Pilot
Access Issues
• Current Needs
– Need to know the overall topic of the video
– More likely to remember the topic, presenter, date or
series
• Permanent Access
– Less likely that users will remember the video’s
metadata
– More likely that users will want specific information
– Terminology may change over time
Indexing Video Content
• Video indexing is similar to a back-of-the
book index for specific information
• Entering a keyword leads you to the specific
location of the subject
Features of Selected Software
• Compares recognized speech with stored
default terminology
• Uses speaker inflection to identify
meaningful intervals
• Indexing and Search components included
Incorporation of NASA Thesaurus
• Added specific scientific terminology
• Incorporated terms and their NTs, RTs and
UF/USE relationships
• Used text of Astrophysics Data System to provide
terms in grammatical structures
• Provides query expansion and improves relevancy
Query Expansion
“Saturn Moons”
+ Ios
+ Triton
Or
“Scatha Satellite”
+ P78-2 Satellite
Query Expansion (Illustrated)
Sample Search (aurora) on same one hour lecture entitled “Jupiter’s
Aurora”. One file was indexed using the NASA thesaurus, the other was
indexed using a more basic scientific word list.
Benefits
GREATER overall
relevance understanding
MORE relevant content
found (2M+ VS 20 Sec’s)
Ignores IRRELEVANT
content (Speech
Recognition Error)
Relevance Interval Creation
• Relevance Interval Creation links related
concepts within media files, which drives
Relevance Intervals
• External knowledge from the thesaurus
improves the accuracy of the Creation
process because the explicit knowledge in
text is incomplete
Relevance Interval (Illustrated)
Sample Search (aurora) on same one hour lecture entitled “Jupiter’s
Aurora”. One file was indexed using the NASA thesaurus, the other was
indexed using a more basic scientific word list.
Benefits
GREATER overall
relevance understanding
MORE relevant content
found (2M+ VS 20 Sec’s)
Ignores IRRELEVANT
content (Speech
Recognition Error)
Benefits
• Identify relevant pieces of content within a
longer video
• Stream more relevant, specific information
intervals to users
• Minimize manual processing
• Ultimately improve reuse of information
and increase opportunities for knowledge
sharing