Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SIMS 296a-3: Current Topics in Information Access Marti Hearst Fall ‘98 Today Introductions Goals and Course Requirements Administrivia Topics What is Information Access Current Topics (an outline) Intro to IA Goals Become expert on the state-of-the-art in timely topics related to information access Begin getting research results. Course Requirements To get S/U credit for the class Lead two discussions Do the readings Attend the meetings Course Requirements To get a grade in the class Do the above Do one of the following (optionally with the help of a faculty member and/or another student): Write a publishable survey paper on an emerging area of information access. Do research that should lead to a publishable research paper on a new idea, method, analysis, or vision statement for an emerging area of information access. Implement and/or evaluate code to further an information access research project. Administrivia Sign up sheet Readings Other questions? Outline What is Information Access? Goals, Tasks, Types of data Standard Information Retrieval Assumptions, Techniques, Evaluation Current Topics Candidate topics What is Information Access? Information Access: The process by which users use information technology to seek, organize, and understand information. Focus: information expressed as text. Information Retrieval Task Statement Build a system that retrieves documents that users are likely to find relevant to their queries. This set of assumptions underlies the field of Information Retrieval. Information Retrieval Assumptions The system has available only preexisting, “canned” text passages. Its response is limited to selecting from these passages and presenting them to the user. It must select, say, 10 or 20 passages out of millions or billions! Top 10 Research Issues for IR What do people want from IR? By Bruce Croft, DLIB Magazine, Nov 95 Based on work observations from work on public-domain systems, including: THOMAS American Memory Project (Library of Congress) The order of importance does not correspond to many IR researchers’ priorities. The same can be said for AI researchers. Top 10 Research Issues for IR Bruce Croft, DLIB Magazine, Nov 95. In descending order of importance. Integrated Solutions Distributed IR Efficient, Flexible Indexing and Retreival “Magic” (Effective Vocabulary Expansion) Interfaces and Browsing Routing and Filtering Effective Retrieval Multimedia Retrieval Information Extraction Relevance Feedback Other Issues Mundane issues are important Spelling Correction Fast display of initial results Less important but more interesting from many researchers’ points of view: (Bruce Croft, DLIB Magazine, Nov 95) Multilingual IR Data Mining (in text databases) Text Categorization Matching Tasks, Collections, and Search Systems Typical WWW search is not the whole picture. Different information needs require: different collections different search systems and strategies Compare: general WWW newswire and magazines medical journal articles Match Task and Search Type WWW Tasks: (from www.cnet.com/Content/Reviews/Compare/Seach/ss1a.html) Find how-to pages for Doom. Purchase plane tickets and hotel for a trip to Java. Find the top five all-time scoring leaders in the national hockey league. Find a recipe for potato latkes. Find the tide tables for Maui. Characteristics: Timely, specific, found via help from human agents and in well-known resources before the WWW. Match Task and Search Type Newswire & Magazine Tasks: (from the TREC collection) Find articles on research into cures for osteoporosis. Find articles on the effects of recycling of tires on the environment. Find information on jail and prison overcrowding and how inmates are forced to cope with those conditions. Find discussion of an existing or proposed insurance plan (governmental, commercial or individual) and the coverage it provides for long term care confinements in an institution. Characteristics: Complex combinations of topics. Research-oriented Either timely or retrospective Match Task and Search Type MEDLINE Tasks: (From OHSUMED, medir.ohsu.edu/pub/ohsumed) Are there adverse effects on lipids when progesterone is given with estrogen replacement therapy? Pathophysiology and treatment of disseminated intravascular coagulation. Reviews on subdurals in the elderly. Effectiveness of etidronate in treating hypercalcemia of malignancy. Characteristics Research-oriented Technical Cause and Effect, Implications The Problem of Information Access Main problem: Computers can’t understand natural language. Therefore: Information access systems must guide users to information of interest by approximate methods. General common methods: word match topic directories Why Text is Tough Abstract concepts difficult to represent (AI-Complete) “Countless” combinations of subtle, abstract relationships among concepts Many ways to represent similar concepts space ship, flying saucer, UFO, figment of imagination Concepts are difficult to visualize High dimensionality Tens or hundreds of thousands of features Why Text is Tough I saw Pathfinder on Mars with a telescope. Pathfinder photographed Mars. The Pathfinder photograph mars our perception of a lifeless planet. The Pathfinder photograph from Ford has arrived. The Pathfinder forded the river without marring its paint job. Outline What is Information Access? Goals, Tasks, Types of data Standard Information Retrieval Assumptions, Techniques, Evaluation Current Topics Candidate topics User Interfaces Quality Assessment Text Data Mining Student suggestions Tools for Information Access User Interfaces (information visualization) Information Access (information retrieval) Language and Content Analysis Task Analysis Current Topics User Interfaces Incorporating “personal” information Automated “Agents” vs. User Initiated Steps Support for the dynamic process of information access How to organize large search results Categories, clusters, combinations of these Question Answering Others? Current Topics Quality Assessment Issues: How to define quality Rating methods Different fields (medicine, business) Techniques Visitation patterns and times “Social” techniques Link structure (co-citation patterns) Link structure + content Current Topics Text Data Mining Visualizating the contents of large text collections Automatically discovering associations within text collections Discovering useful patterns Spotting anomalies *Finding chains of associated information *I have a proposal for this Current Topics Cognitive modeling/AI techniques Your idea goes here: For Next Time Do background reading Think about which topics to pursue I will present more background information