Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Digital Libraries Spring 2006, 1 March Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of Tennessee Digital Libraries What does the digital library concept mean to you  as a user  as an information professional  as an author Is the Web a digital library? Why? Why not? Your definition or notion? Digital Libraries  What is the role of a librarian or information professional? How has this role changed in the context of digital libraries? The Web: Implications for DLs  Ubiquitous information source: Why is the web “a much more engaging medium and teacher” than textbooks or a local librarian? Identify pros and cons for specific situations in the different quadrants? Finding Information on the Web  Web directories for browsing Yahoo! -- human indexers/catalogers classificatory structure  Web search engines for querying AltaVista, Google -- robots automatically generated indexes  Combination of directory and engine Paradigm shift Classic IR Web IR Collection professionals selection policy polling (robot) Representation description access points full text metadata Search algorithms master file inverted indexes non Boolean proprietary Interface good functionality simplistic complex trade off Digital Library Features      community based users extension and enhancement of classic IRs digital resources are multimedia: text, images, sounds, etc. technical capabilities for creating, searching, and using information distributed using networks (the Web, etc.) Digital Library Features    content of digital libraries includes data, metadata that describe various aspects of the data links (or relations) to other data or metadata (internal or external) context portals to support individual users’ information needs and work tasks Digital Library Projects   Digital Libraries Initiatives phase II <http://www.dli2.nsf.gov/> LC American Memory Website <http://memory.loc.gov/> standards <http://lcweb.loc.gov/standards/metadat a.html> Example Digital Libraries   The National Science Digital Library http://nsdl.org/ Library portals extend and serve classrooms, offices, laboratories, homes, and public spaces. Information Theory (for DLs) Joseph Goguen: A theory of information should be     Useful for understanding and designing info systems (or DLs) Address the meanings that users give to events, including social and political nuances Address ethical issues Account for the fact that different individuals and groups can construe meanings in very different ways Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Science Research, Technical Systems and Cooperative Work, edited by Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997). Goguen’s Info Qualities Relevant to DLs 1. 2. 3. 4. 5. 6. 7.   Situated: Info can only be fully understood in relation to the particular, concrete situation in which it actually occurs Local: Interpretations are constructed in some particular context, including a particular time, place, and group Emergent: Info cannot be fully understood at the level of the individual, that is at the level of the individual psychology, because it arises through ongoing interactions with other people/technologies Contingent: Interpretation of info depends upon current situation, which may include the current interpretation of prior events Embodied: Info is tied to documents/bodies in particular situations, so that the particular way that bodies are embedded in a situation may be essential to some interpretations Vague: In practice, info is only elaborated to the degree that it is useful to do so; the rest is grounded in intangible knowledge Open: Info cannot in general be given a final and complete form, but must remain open to revision in the light of future developments “Wet” information: strongly situated, less mobile “Dry” information: Weakly situated; more mobile Joseph Goguen, “Towards a Social Ethical Theory of Information” in Social Science Research, Technical Systems and Cooperative Work, edited by Geoffery Bowker, Les Gasser, Leigh Star and William Turner. (Erlbaum, 1997). Issues of Text Representation in DLs Storing textual materials is related to its:  Structure (characters, words, paragraphs, headings): Represented by mark-up, e.g., Standard Generalized Markup Language  Appearance (choice of format, size of font, margins, line spacing, how headings are represented, location of figures)” Pagedescription languages precisely describe the appearance, e.g., TeX, PostScript, Portable Document Format (PDF) Alternative renderings of a single document Converting Text Scanning: Optical character recognition Encoding characters: ASCII, Unicode Document type definitions (DTDs) in the Text Encoding Initiative (TEI), Encoded Archival Description (EAD) Three General Types of Metadata 1. Object-descriptor metadata (Dublin Core) Designed to describe global characteristics of entire objects with external references 2. Internal/Structural Metadata (HTML, XML, RDF) Designed to describe internal semantic structure of objects with internal and external references 3. Display Metadata (HTML, StyleSheets) Designed to describe how objects or parts of objects should be visualized or displayed. Not necessarily related to semantic structure What is a Database? A database is a collection of data that is organized so that its contents can easily be accessed, managed and updated. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. Relational Databases A database system in which the database is organized and accessed according to the relationships between data items without the need for any consideration of physical orientation and relationship. Relationships between data items are expressed by means of tables. Features of Databases • Collection of data stored together as a unit • Databases are useful for storing data and making it available for retrieval • Within the database, data is organized into different tables • Each table has columns and rows. Indexes on tables provide speedy access to data • Information in the database can be retrieved, modified, or deleted using a query language like SQL • Some common database systems are Oracle, SQL Server, DB2, Sybase, etc. Relational Database Model  Data is presented as a collection of relations  Each relation is depicted as a table  Columns are attributes  Rows represent entities  Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity Relational Database Model Views in a database Company maintains a database of its employees • Other attributes of its employees: age, salary, emergency contacts, appraisal, etc. • Different needs for different applications of the database: e.g., company may need to make available demographic data to a governmental agency • Only some attributes need be supplied - and others ought not to so as to protect privacy: different views can be provided into the same data Database Design  Identify entities that we are dealing with, their various attributes, and their relationships  An entity is some object with a real or conceptual existence in the world -- tofu, Advanced Java Class, Guggenheim Museum, Elaine, company  Attribute is a property of an entity -- address, size, mother, age  A relational column is an attribute  A relationship defines roles in which entities work together -"Bill WORKS-FOR Motorola", "jbs TEACHES advanced-java"  RDBMSs represent relationships as tables Database Design as ER Diagrams Rectangles represent entity types, diamonds relationship types, and ovals attributes. Underlined attribute names represent keys Rectangles: Object/concept nouns Diamonds: Verbs Ovals: Characteristics Functions: Join Microsoft Access provides a graphical user interface that makes it very easy to define and manipulate databases. E.g., membership records in an organization Access allows you to define and then store a set of queries and give these queries names that are meaningful to you. Note the Tables and Queries tabs in particular (Reports is useful for generating hardcopy output, such as mailing labels). Tables in Microsoft Access Final Projects o o o o o Two-student teams work on projects for the DiscoverET.org or develop their own Each team will present final results to the class during a public forum and produce a document of the project Information Organization and Representation Portfolio (IORP) Includes analysis and/or commentary related to class topics Intellectual works and their manifestations, metadata standards in various environments, cataloging and authority control, metadata coding and crosswalks, digital library development, subject access and vocabulary control, concept mapping, indexing and abstracting, classification systems, cognitive category analysis, system design Evaluation based on : Creativity of project outcomes (recommendations/ solutions proposed), Relevance and practicality of implementation, Thoroughness and examination of details Final Project General Guidelines Purpose is to apply knowledge to real life situations and to gain hands-on experiences. o I. You must sign up for the project and work in a two-student team. o II. Each group must schedule a meeting with the instructor to discuss the project no later than the due date indicated in schedule. o III. Each group must document the process and activities. Turn in your project documentation including the following parts: o  Introduction: Topic description and project goals; members  Specific tasks that are distributed among members  The final product plus description and examples (this is the main part of the document)  Conclusions and experiences (summarize what you have learned and your thoughts; you may add what you would do if you would do it again) Final Projects: Road Map/TOC/Outline for the Information Organization Portfolio Introduction I. • • II. III. What is your project? Expectations, Required elements, etc. Issues/concerns specific to your project topic that play a role in developing an IOP Class topics and their relationship to your project 3-5 key considerations about each topic that is significant in developing an IOP on the specific project Case-Studies and their Critique based on class topics or more List of web resources (DL or web portal) with short description and location 3 or more case studies as relevant Comparative analysis IS 520~Mehra Final Projects: Road Map/TOC/Outline for the Information Organization Portfolio IV. Design Solutions/Templates    Design solutions reflecting key aspects Web design solutions Analysis of designs V. Recommendations VI. Future Considerations VII. Documentation Report IS 520~Mehra Final Project Examples 1. On the existing DiscoverET.org website, develop an IORP for presenting community-based information for a selected subject category “Health.” • • • • Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions. Do a case-analysis of existing content and representation scheme(s) on websites of other community networks and provide alternative design solutions. Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects. Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. Final Project Examples 2. On the existing DiscoverET.org website, develop an IORP for presenting community-based information for a selected subject category “Tourism.” • • • • Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions. Do a case-analysis of existing content and representation scheme(s) on websites of other community networks and provide alternative design solutions. Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects. Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. Final Project Examples 3. • • • • For the existing DiscoverET.org website, develop an IORP for presenting community-based information for a new subject category of “Diversity Resources.” Do a case-analysis of existing content and representation scheme(s) (related to “Diversity”) on the website and provide alternative design solutions. Do a case-analysis and critique of existing content and representation scheme(s) on selected websites/web portals (other community networks) on the subject site and provide alternative design solutions. Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects. Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. Final Project Examples 4. Select one county in Tennessee and develop an IORP for presenting community-based information for the county. • • • • Do a case-analysis of existing content and representation scheme(s) on the website and provide alternative design solutions. Your IORP should include a comprehensive collection of website listings for that county, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects. Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. Provide a test-bed for implementation based on selection for one selected county from the adjoining states or select from the following website: URL: http://www.discoveret.org/index.php?p=DirCountySearch IS 520~Mehra Final Project Examples 5. Based on a study of the use of wikis in existing and emerging community-based web portals, develop an IORP for presenting community-based interactive communication and information-sharing interactive tools via development of wikis on the DiscoverET.org website. • • • • Do a case-analysis and critique of existing content and representation scheme(s) on selected websites/web portals (other community networks) that have wikis and provide alternative design solutions. Your IORP should include a comprehensive collection of website listings on that subject, a classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects. Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. Evaluate the forms of interaction taking place via the different wikis in the different settings. Present the pro and cons based upon your analysis while you make recommendations for the DiscoverET.org website. Present summary reports for use of wikis as community-based interactive communication and information-sharing tools that includes design options and implementation plan for application. Final Project Examples 6. Based on a study of the use of interactive databases for organizing, representing, and managing community-based information in representative case examples, provide a scheme for a community client (Fish) at DiscoverET.org who want to develop a system to keep up track of their activities/events and organize their work and human resources (time schedules, working responsibilities, etc.). • • • Based on case-analysis and critique of existing content and representation scheme(s) in databases on selected websites/web portals (other community networks), identify what kind of databases the client can use, discussion on pros and cons for each, cost-benefit ratios, etc. Your IORP should include a comprehensive collection of database examples, identification of entities and attributes for your designed database, classification scheme for representation of information, and various design solutions for the presentation of content, amongst other aspects. Also, identify elements in an organizational plan for an IR system that includes metadata schemes, menu options, and searching capabilities. For the DiscoverET.org website 1. Present community-based information for a selected subject category “Health” 2. Present community-based information for a selected subject category “Tourism”: Pam, Suzanne 3. Present community-based information for a new subject category “Diversity Resources”: Hannah, Deborah 4. Select one county in Tennessee and develop an IORP for presenting community-based information for the county: Sara, Christa 5. Study of the use of wikis in existing and emerging community-based web portals: Margaret, Emily 6. Study of the use of interactive databases for organizing, representing, and managing community-based information in representative case examples: Bridger, Roger Critical Reflection 7  In pairs identify a subject domain and select at least five items to form a template design for a digital library. Brainstorm various topics/aspects covered in class that will be pertinent for creating an effective information organization and representation scheme for your digital library. Design a database for your collection and identify key entities, attributes, and relationships. Present an ER Diagram to reflect some aspects of your database design. Critical Reflection        Goals for the metadata and users: Are you clear about what you want to achieve with this metadata? Are you clear about your users’ use of the resources? Granularity: What level of granularity is most appropriate to the items and user needs? Sources of info: Is it clear or even stated where you get your information? For example, if title is a field, is the cataloger told where to find that info? For example with a videotape- do you look on the label? The box? Complexity of record creation: Are special skills required to formulate the records? Are the records designed to be created by the info ‘publisher’ or centrally by service providers? Content: The content of different metadata record formats can be compared from aspects of structure and syntax, but perhaps most important is an evaluation of the usefulness and purpose of the info within them. How useful are the records you have created? Works well or not: What fields or characteristics work well (or do not work well) in describing your objects? Tweaking: How could/should the metadata be “tweaked” to accommodate your needs?