Download 1march

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Digital Libraries
Spring 2006, 1 March
Bharat Mehra
IS 520 (Organization and Representation of Information)
School of Information Sciences
University of Tennessee
Digital Libraries
What does the digital library concept mean to
you

as a user

as an information professional

as an author
Is the Web a digital library? Why? Why not?
Your definition or notion?
Digital Libraries

What is the role of a librarian or
information professional? How has this
role changed in the context of digital
libraries?
The Web: Implications for DLs

Ubiquitous information source: Why
is the web “a much more engaging
medium and teacher” than textbooks
or a local librarian?
Identify pros and
cons for specific
situations in the
different
quadrants?
Finding Information on the Web

Web directories for browsing
Yahoo! -- human indexers/catalogers
classificatory structure

Web search engines for querying
AltaVista, Google -- robots
automatically generated indexes

Combination of directory and engine
Paradigm shift
Classic IR
Web IR
Collection
professionals
selection policy
polling (robot)
Representation
description
access points
full text
metadata
Search
algorithms
master file
inverted indexes
non Boolean
proprietary
Interface
good functionality simplistic
complex
trade off
Digital Library Features





community based users
extension and enhancement of classic
IRs
digital resources are multimedia: text,
images, sounds, etc.
technical capabilities for creating,
searching, and using information
distributed using networks (the Web,
etc.)
Digital Library Features



content of digital libraries includes data,
metadata that describe various aspects
of the data
links (or relations) to other data or
metadata (internal or external)
context portals to support individual
users’ information needs and work tasks
Digital Library Projects


Digital Libraries Initiatives phase II
<http://www.dli2.nsf.gov/>
LC American Memory Website
<http://memory.loc.gov/>
standards
<http://lcweb.loc.gov/standards/metadat
a.html>
Example Digital Libraries


The National Science Digital Library
http://nsdl.org/
Library portals extend and serve
classrooms, offices, laboratories,
homes, and public spaces.
Information Theory (for DLs)
Joseph Goguen: A theory of information should be




Useful for understanding and designing info
systems (or DLs)
Address the meanings that users give to events,
including social and political nuances
Address ethical issues
Account for the fact that different individuals and
groups can construe meanings in very different
ways
Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social
Science Research, Technical Systems and Cooperative Work, edited by Geoffery
Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997).
Goguen’s Info Qualities Relevant to DLs
1.
2.
3.
4.
5.
6.
7.


Situated: Info can only be fully understood in relation to the particular,
concrete situation in which it actually occurs
Local: Interpretations are constructed in some particular context, including
a particular time, place, and group
Emergent: Info cannot be fully understood at the level of the individual, that
is at the level of the individual psychology, because it arises through
ongoing interactions with other people/technologies
Contingent: Interpretation of info depends upon current situation, which
may include the current interpretation of prior events
Embodied: Info is tied to documents/bodies in particular situations, so that
the particular way that bodies are embedded in a situation may be essential
to some interpretations
Vague: In practice, info is only elaborated to the degree that it is useful to
do so; the rest is grounded in intangible knowledge
Open: Info cannot in general be given a final and complete form, but must
remain open to revision in the light of future developments
“Wet” information: strongly situated, less mobile
“Dry” information: Weakly situated; more mobile
Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social
Science Research, Technical Systems and Cooperative Work, edited by Geoffery
Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997).
Issues of Text Representation in DLs
Storing textual materials is related to its:
 Structure (characters, words, paragraphs,
headings): Represented by mark-up, e.g.,
Standard Generalized Markup Language

Appearance (choice of format, size of font,
margins, line spacing, how headings are
represented, location of figures)” Pagedescription languages precisely describe the
appearance, e.g., TeX, PostScript, Portable
Document Format (PDF)
Alternative renderings of a single document
Converting Text
Scanning: Optical character
recognition
Encoding characters: ASCII, Unicode
Document type definitions (DTDs) in
the Text Encoding Initiative (TEI),
Encoded Archival Description (EAD)
Three General Types of Metadata
1. Object-descriptor metadata (Dublin Core)
Designed to describe global characteristics of
entire objects with external references
2. Internal/Structural Metadata (HTML, XML, RDF)
Designed to describe internal semantic
structure of objects with internal and external
references
3. Display Metadata (HTML, StyleSheets)
Designed to describe how objects or parts of
objects should be visualized or displayed. Not
necessarily related to semantic structure
What is a Database?
A database is a collection of data that is
organized so that its contents can easily be
accessed, managed and updated. The most
prevalent type of database is the relational
database, a tabular database in which data
is defined so that it can be reorganized and
accessed in a number of different ways. A
distributed database is one that can be
dispersed or replicated among different
points in a network.
Relational Databases
A database system in which the
database is organized and
accessed according to the
relationships between data items
without the need for any
consideration of physical
orientation and relationship.
Relationships between data items
are expressed by means of tables.
Features of Databases
•
Collection of data stored together as a unit
•
Databases are useful for storing data and
making it available for retrieval
•
Within the database, data is organized into
different tables
•
Each table has columns and rows. Indexes on
tables provide speedy access to data
•
Information in the database can be retrieved,
modified, or deleted using a query language like
SQL
•
Some common database systems are Oracle,
SQL Server, DB2, Sybase, etc.
Relational Database Model

Data is presented as a collection of relations

Each relation is depicted as a table

Columns are attributes

Rows represent entities

Every table has a set of attributes that taken
together as a "key" (technically, a "superkey")
uniquely identifies each entity
Relational Database Model
Views in a database
Company maintains a database of its employees
• Other attributes of its employees: age, salary, emergency contacts,
appraisal, etc.
• Different needs for different applications of the database: e.g.,
company may need to make available demographic data to a
governmental agency
• Only some attributes need be supplied - and others ought not to so
as to protect privacy: different views can be provided into the same
data
Database Design

Identify entities that we are dealing with, their various
attributes, and their relationships

An entity is some object with a real or conceptual existence in
the world -- tofu, Advanced Java Class, Guggenheim Museum,
Elaine, company

Attribute is a property of an entity -- address, size, mother, age

A relational column is an attribute

A relationship defines roles in which entities work together -"Bill WORKS-FOR Motorola", "jbs TEACHES advanced-java"

RDBMSs represent relationships as tables
Database Design as ER Diagrams
Rectangles represent entity types, diamonds relationship
types, and ovals attributes. Underlined attribute names
represent keys
Rectangles: Object/concept nouns
Diamonds: Verbs
Ovals: Characteristics
Functions: Join
Microsoft Access provides a graphical user interface that
makes it very easy to define and manipulate databases.
E.g., membership records in an organization
Access allows you
to define and then
store a set of
queries and give
these queries
names that are
meaningful to you.
Note the Tables and
Queries tabs in
particular (Reports
is useful for
generating hardcopy
output, such as
mailing
labels).
Tables in Microsoft Access
Final Projects
o
o
o
o
o
Two-student teams work on projects for the DiscoverET.org or
develop their own
Each team will present final results to the class during a public forum
and produce a document of the project
Information Organization and Representation Portfolio (IORP)
Includes analysis and/or commentary related to class topics
Intellectual works and their manifestations, metadata standards in
various environments, cataloging and authority control, metadata
coding and crosswalks, digital library development, subject access
and vocabulary control, concept mapping, indexing and abstracting,
classification systems, cognitive category analysis, system design
Evaluation based on : Creativity of project outcomes
(recommendations/ solutions proposed), Relevance and practicality of
implementation, Thoroughness and examination of details
Final Project General Guidelines
Purpose is to apply knowledge to real life situations and to gain hands-on
experiences.
o
I. You must sign up for the project and work in a two-student team.
o
II. Each group must schedule a meeting with the instructor to discuss the
project no later than the due date indicated in schedule.
o
III. Each group must document the process and activities. Turn in your
project documentation including the following parts:
o

Introduction: Topic description and project goals; members

Specific tasks that are distributed among members

The final product plus description and examples (this is the main part of the
document)

Conclusions and experiences (summarize what you have learned and your
thoughts; you may add what you would do if you would do it again)
Final Projects: Road Map/TOC/Outline for
the Information Organization Portfolio
Introduction
I.
•
•
II.
III.
What is your project? Expectations, Required elements, etc.
Issues/concerns specific to your project topic that play a role in
developing an IOP
Class topics and their relationship to your project
3-5 key considerations about each topic that is significant in
developing an IOP on the specific project
Case-Studies and their Critique based on class topics or more
List of web resources (DL or web portal) with short description and
location
3 or more case studies as relevant
Comparative analysis
IS 520~Mehra
Final Projects: Road Map/TOC/Outline for the
Information Organization Portfolio
IV. Design Solutions/Templates



Design solutions reflecting key aspects
Web design solutions
Analysis of designs
V. Recommendations
VI. Future Considerations
VII. Documentation Report
IS 520~Mehra
Final Project Examples
1.
On the existing DiscoverET.org website, develop an IORP for
presenting community-based information for a selected
subject category “Health.”
•
•
•
•
Do a case-analysis of existing content and representation
scheme(s) on the website and provide alternative design
solutions.
Do a case-analysis of existing content and representation
scheme(s) on websites of other community networks and provide
alternative design solutions.
Your IORP should include a comprehensive collection of website
listings on that subject, a classification scheme for representation
of information, and various design solutions for the presentation
of content, amongst other aspects.
Also, identify elements in an organizational plan for an IR system
that includes metadata schemes, menu options, and searching
capabilities.
Final Project Examples
2.
On the existing DiscoverET.org website, develop an IORP
for presenting community-based information for a selected
subject category “Tourism.”
•
•
•
•
Do a case-analysis of existing content and representation
scheme(s) on the website and provide alternative design
solutions.
Do a case-analysis of existing content and representation
scheme(s) on websites of other community networks and
provide alternative design solutions.
Your IORP should include a comprehensive collection of
website listings on that subject, a classification scheme for
representation of information, and various design solutions for
the presentation of content, amongst other aspects.
Also, identify elements in an organizational plan for an IR
system that includes metadata schemes, menu options, and
searching capabilities.
Final Project Examples
3.
•
•
•
•
For the existing DiscoverET.org website, develop an
IORP for presenting community-based information for
a new subject category of “Diversity Resources.”
Do a case-analysis of existing content and representation
scheme(s) (related to “Diversity”) on the website and
provide alternative design solutions.
Do a case-analysis and critique of existing content and
representation scheme(s) on selected websites/web portals
(other community networks) on the subject site and provide
alternative design solutions.
Your IORP should include a comprehensive collection of
website listings on that subject, a classification scheme for
representation of information, and various design solutions
for the presentation of content, amongst other aspects.
Also, identify elements in an organizational plan for an IR
system that includes metadata schemes, menu options,
and searching capabilities.
Final Project Examples
4. Select one county in Tennessee and develop an IORP
for presenting community-based information for the county.
•
•
•
•
Do a case-analysis of existing content and representation
scheme(s) on the website and provide alternative design solutions.
Your IORP should include a comprehensive collection of website
listings for that county, a classification scheme for representation of
information, and various design solutions for the presentation of
content, amongst other aspects.
Also, identify elements in an organizational plan for an IR system
that includes metadata schemes, menu options, and searching
capabilities.
Provide a test-bed for implementation based on selection for one
selected county from the adjoining states or select from the
following website: URL:
http://www.discoveret.org/index.php?p=DirCountySearch
IS 520~Mehra
Final Project Examples
5. Based on a study of the use of wikis in existing and
emerging community-based web portals, develop an IORP
for presenting community-based interactive
communication and information-sharing interactive tools
via development of wikis on the DiscoverET.org website.
•
•
•
•
Do a case-analysis and critique of existing content and
representation scheme(s) on selected websites/web portals (other
community networks) that have wikis and provide alternative design
solutions.
Your IORP should include a comprehensive collection of website
listings on that subject, a classification scheme for representation of
information, and various design solutions for the presentation of
content, amongst other aspects.
Also, identify elements in an organizational plan for an IR system
that includes metadata schemes, menu options, and searching
capabilities. Evaluate the forms of interaction taking place via the
different wikis in the different settings.
Present the pro and cons based upon your analysis while you make
recommendations for the DiscoverET.org website. Present summary
reports for use of wikis as community-based interactive
communication and information-sharing tools that includes design
options and implementation plan for application.
Final Project Examples
6. Based on a study of the use of interactive databases for
organizing, representing, and managing community-based
information in representative case examples, provide a
scheme for a community client (Fish) at DiscoverET.org who
want to develop a system to keep up track of their
activities/events and organize their work and human
resources (time schedules, working responsibilities, etc.).
•
•
•
Based on case-analysis and critique of existing content and
representation scheme(s) in databases on selected
websites/web portals (other community networks), identify
what kind of databases the client can use, discussion on pros
and cons for each, cost-benefit ratios, etc.
Your IORP should include a comprehensive collection of
database examples, identification of entities and attributes for
your designed database, classification scheme for
representation of information, and various design solutions
for the presentation of content, amongst other aspects.
Also, identify elements in an organizational plan for an IR
system that includes metadata schemes, menu options, and
searching capabilities.
For the DiscoverET.org website
1. Present community-based information for a selected subject category “Health”
2. Present community-based information for a selected subject category
“Tourism”: Pam, Suzanne
3. Present community-based information for a new subject category “Diversity
Resources”: Hannah, Deborah
4. Select one county in Tennessee and develop an IORP for presenting
community-based information for the county: Sara, Christa
5. Study of the use of wikis in existing and emerging community-based web
portals: Margaret, Emily
6. Study of the use of interactive databases for organizing, representing, and
managing community-based information in representative case examples:
Bridger, Roger
Critical Reflection 7

In pairs identify a subject domain and select at
least five items to form a template design for a
digital library. Brainstorm various topics/aspects
covered in class that will be pertinent for creating
an effective information organization and
representation scheme for your digital library.
Design a database for your collection and identify
key entities, attributes, and relationships. Present
an ER Diagram to reflect some aspects of your
database design.
Critical Reflection







Goals for the metadata and users: Are you clear about what you
want to achieve with this metadata? Are you clear about your
users’ use of the resources?
Granularity: What level of granularity is most appropriate to the
items and user needs?
Sources of info: Is it clear or even stated where you get your
information? For example, if title is a field, is the cataloger told
where to find that info? For example with a videotape- do you
look on the label? The box?
Complexity of record creation: Are special skills required to
formulate the records? Are the records designed to be created by
the info ‘publisher’ or centrally by service providers?
Content: The content of different metadata record formats can be
compared from aspects of structure and syntax, but perhaps most
important is an evaluation of the usefulness and purpose of the
info within them. How useful are the records you have created?
Works well or not: What fields or characteristics work well (or do
not work well) in describing your objects?
Tweaking: How could/should the metadata be “tweaked” to
accommodate your needs?