Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Forecasting wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Information privacy law wikipedia , lookup
INFORMATION RETRIEVAL data… f r a m e w o r k fo r t o d ay ’ s l e c t u r e … data organizing data retrieving data tools supporting the process Structured Data Unstructured data •information with a high degree of organization •easy to put into a relational database •search is simple and straightforward •essentially the opposite of structured data •natural language / free text STRUCTURED VS UNSTRUCTURED DATA easy to envision structured data in terms of “tables” Employee Manager Salary Smith Jones 50000 Chang Smith 60000 Ivy Smith 50000 Typically allows numerical range and exact match (for text) queries, e.g., Salary < 60000 AND Manager = Smith. 4 Relational Databases •Structured data •Designed to provide search results with exact answers •Queries built on schema of structured fields •Lack of ranking mechanism (initially) •We know the schema in advance, so semantic correlation between queries and data is clear •We can get exact answers Information Retrieval Systems tables in a MS Access relational database – defines each defining a social networking site Data entry form in a MS Access relational database – create each record Structured Data Unstructured data •information with a high degree of organization •easy to put into a relational database •search is simple and straightforward •essentially the opposite of structured data •natural language / free text structured vs UNSTRUCTURED data typically refers to free text email is a good example of unstructured data. it's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured other examples of unstructured data include books, documents, medical records, and social media posts magazine article is an example of unstructured data Relational Databases Information Retrieval Systems •Unstructured / semistructured data •Designed to support unstructured natural language full text search •Ranking mechanism is very important – results must be sorted by relevance in order to satisfy user’s information need •We get inexact, estimated answers Query Representation function Matching function Document collection (corpus) Representation function Index CATEGORIES SUBJECT HEADINGS Results KWIC Key word in context KWIC Key word in context metadata metadata WHAT IS METADATA? Classic definition: data about data Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. (NISO) 3 primary “types”: Descriptive Structural Administrative (rights management, preservation) More Metadata: A Cataloging Record http://search.lib.unc.edu/search?R=UNC b4448196 THE IDEA OF FACETS Facets are a way of labeling data A kind of Metadata (data about data) Can be thought of as properties of items Facets vs. Categories Items are placed INTO a category system Multiple facet labels are ASSIGNED TO items Facets Epicurious example http://www.epicurious.com/ Create INDEPENDENT categories (facets) Each facet has labels (sometimes arranged in a hierarchy) Assign labels from the facets to every item Example: recipe collection Ingredient Cooking Method Chicken Stir-fry Bell Pepper Curry Course Cuisine Main Course Thai THE IDEA OF FACETS Break out all the important concepts into their own facets Sometimes the facets are hierarchical Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple USING FACETS Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Fruit > Pineapple Dessert > Cake Preparation > Bake Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze UNC Libraries Online Catalog http://www.lib.unc.edu/ Let’s look at a database of magazine & journal articles… …Academic Search Complete >> UNC Libraries Homepage: http://www.lib.unc.edu/ >> E-Research Tools >> Frequently Used >> Academic Search Complete [off-campus log in with onyen/password ORGANIZATION / SEARCH We organize to enable retrieval The more effort we put into organizing information, the more effectively it can be retrieved The more effort we put into retrieving information, the less it needs to be organized first We need to think in terms of investment, allocation of costs and benefits between the organizer and retriever The allocation differs according to the relationship between them; who does the work and who gets the benefit?