* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download information retrieval
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Clusterpoint wikipedia , lookup
Data vault modeling wikipedia , lookup
3D optical data storage wikipedia , lookup
Relational model wikipedia , lookup
Information privacy law wikipedia , lookup
information retrieval mon feb 08 2016 data… & information organization SPSS Workshop in Odum… • • • • Monday, February 29 2:00 – 3:30 pm Davis Library, Room 219 (same lab room) introduction to SPSS and teach how to work with data saved in SPSS format • no registration required Anyone need an “SPSS Cheat Sheet”? framework for today’s lecture… data organizing data retrieving data tools supporting the process info organization activity • in a small group, examine the cards that identify various “documents” in a collection • on the table organize the document surrogates into some sort of schema – grouping by category (like items with like) • choose your own organization scheme and hierarchy • if desired, write on the blank cards to create new or uber categories • be ready to share your organization method with the class Structured Data • information with a high degree of organization • easy to put into a relational database • search is simple and straightforward Unstructured data • essentially the opposite of structured data • natural language / free text STRUCTURED vs unstructured data easy to envision structured data in terms of “tables” Employee Manager Salary Smith Jones 68000 Chang Smith 65000 Ivy Smith 50000 Typically allows numerical range and exact match (for text) queries, e.g., Salary < 60000 AND Manager = Smith. 7 Relational Databases • Structured data • Designed to provide search results with exact answers • Queries built on schema of structured fields • Lack of ranking mechanism (initially) • We know the schema in advance, so semantic correlation between queries and data is clear • We can get exact answers Information Retrieval Systems tables in a MS Access relational database – defines each entity in a social networking site Data entry form in a MS Access relational database – create each record Structured Data • information with a high degree of organization • easy to put into a relational database • search is simple and straightforward Unstructured data • essentially the opposite of structured data • natural language / free text structured vs UNSTRUCTURED data • typically refers to free text • email is a good example of unstructured data. it's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured • other examples of unstructured data include books, documents, medical records, and social media posts journal article is an example of unstructured data Relational Databases Information Retrieval Systems • Unstructured / semistructured data • Designed to support unstructured natural language full text search • Ranking mechanism is very important – results must be sorted by relevance in order to satisfy user’s information need • We get inexact, estimated answers Query Representation function Matching function Document collection (corpus) Representation function Index CATEGORIES SUBJECT HEADINGS Results KWIC Key word in context metadata What is Metadata? • Classic definition: data about data • Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. (NISO) • 3 primary “types”: – Descriptive – Structural – Administrative (rights management, preservation) digital forensics How do we organize a collection of “documents” so that users can find what they need? from Glushko reading… • what three types/forms of categorization does Glushko discuss in the Categorization in the Wild piece? • give a real-world example of a categorization system and briefly describe the purpose behind it (i.e. what problem is it trying to address?) from Glushko reading… • Cultural categorization – Embodied in culture and language – Acquired implicitly through development via parent-child interactions, language, and experience – Formal education can build on this, but nonformal cultural system can often dominate – Traditional perspective for thinking and research about categorization From Glushko reading… • Individual categorization – A system developed by an individual for organizing a personal domain to aid memory, retrieval, or usage – Can serve social goals to convey information, develop a community, manage reputation – Have exploded with the advent of social computing, especially in applications based on “tagging” – An individual’s system of tags in web applications is sometimes called a “folksonomy” From Glushko reading… • Institutional categorization – Systems created to serve institutional goals and facilitate sharing of information and increase interoperability – Helps to streamline interactions and transactions so that consistency, fairness and higher yields can result. Let’s look at a database of magazine & journal articles…to see how information is organized – with particular attention to value-added SUBJECT TERMS/HEADINGS (categorization) …Academic Search Premier >> UNC Libraries Homepage: http://www.lib.unc.edu/ >> E-Research by Discipline >> Frequently Used >> Academic Search Premier [off-campus log in with onyen/password] Handout Activity #2 info organization & search • We organize to enable retrieval • The more effort put into organizing information, the more effectively it can be retrieved • The more effort we put into retrieving information, the less it needs to be organized first • We need to think in terms of investment, allocation of costs and benefits between the organizer and retriever • The allocation differs according to the relationship between them; who does the work and who gets the benefit? final notes… • Homework #2: Database report – sign up for a database – or talk with me about suggestion – next Wednesday – 5-min reports in class • Wednesday: “Information Retrieval” intro with Dr. Jaime Arguello (required reading prep) • Wednesday: Data to Story Project – speed date/pitch