Download Knowledge

Lecture 16: Knowledge Representation SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004 Credits to Marti Hearst and Warren Sack for some of the slides in this lecture IS 202 - FALL 2004 2004.10.21 - SLIDE 1 Agenda • Review of Last Time • Knowledge Representation – The Vocabulary Problem – Commonsense – CYC • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 2 Agenda • Review of Last Time • Knowledge Representation – The Vocabulary Problem – Commonsense – CYC • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 3 Categorization • Processes of categorization are fundamental to human cognition • Categorization is messier than our computer systems would like • Human categorization is characterized by – Family resemblances – Prototypes – Basic-level categories • Considering how human categorization functions is important in the design of information organization and retrieval systems IS 202 - FALL 2004 2004.10.21 - SLIDE 4 Categorization • Classical categorization – Necessary and sufficient conditions for membership – Generic-to-specific monohierarchical structure • Modern categorization – Characteristic features (family resemblances) – Centrality/typicality (prototypes) – Basic-level categories IS 202 - FALL 2004 2004.10.21 - SLIDE 5 Properties of Categorization • Family Resemblance – Members of a category may be related to one another without all members having any property in common • Prototypes – Some members of a category may be “better examples” than others, i.e., “prototypical” members IS 202 - FALL 2004 2004.10.21 - SLIDE 6 Basic-Level Categorization • Perception – Overall perceived shape – Single mental image – Fast identification • Function – General motor program • Communication – Shortest, most commonly used and contextually neutral words – First learned by children • Knowledge Organization – Most attributes of category members stored at this level – Tends to be in the “middle” of a classification hierarchy IS 202 - FALL 2004 2004.10.21 - SLIDE 7 Agenda • Review of Last Time • Knowledge Representation – The Vocabulary Problem – Commonsense – CYC • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 8 Information Hierarchy Wisdom Knowledge Information Data IS 202 - FALL 2004 2004.10.21 - SLIDE 9 Information Hierarchy Wisdom Knowledge Information Data IS 202 - FALL 2004 2004.10.21 - SLIDE 10 Today’s Thinkers/Tinkerers George Furnas http://www.si.umich.edu/~furnas/ Marvin Minsky http://web.media.mit.edu/~minsky/ Doug Lenat http://www.cyc.com/staff.html IS 202 - FALL 2004 2004.10.21 - SLIDE 11 The Birth of AI • Rockefeller-sponsored Institute at Dartmouth College, Summer 1956 – John McCarthy, Dartmouth (->MIT->Stanford) – Marvin Minsky, MIT (geometry) – Herbert Simon, CMU (logic) – Allen Newell, CMU (logic) – Arthur Samuel, IBM (checkers) – Alex Bernstein, IBM (chess) – Nathan Rochester, IBM (neural networks) – Etc. IS 202 - FALL 2004 2004.10.21 - SLIDE 12 Definition of AI “... artificial intelligence [AI] is the science of making machines do things that would require intelligence if done by [humans]” (Minsky, 1963) IS 202 - FALL 2004 2004.10.21 - SLIDE 13 The Goals of AI Are Not New • Ancient Greece – Daedalus’ automata • Judaism’s myth of the Golem • 18th century automata – Singing, dancing, playing chess? • Mechanical metaphors for mind – Clock – Telegraph/telephone network – Computer IS 202 - FALL 2004 2004.10.21 - SLIDE 14 Some Areas of AI • • • • • • • • • • Knowledge representation Programming languages Natural language understanding Speech understanding Vision Robotics Planning Machine learning Expert systems Qualitative simulation IS 202 - FALL 2004 2004.10.21 - SLIDE 15 AI or IA? • Artificial Intelligence (AI) – Make machines as smart as (or smarter than) people • Intelligence Amplification (IA) – Use machines to make people smarter IS 202 - FALL 2004 2004.10.21 - SLIDE 16 Agenda • Review of Last Time • Knowledge Representation – The Vocabulary Problem – Commonsense – CYC • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 17 Furnas: The Vocabulary Problem • People use different words to describe the same things – “If one person assigns the name of an item, other untutored people will fail to access it on 80 to 90 percent of their attempts.” – “Simply stated, the data tell us there is no one good access term for most objects.” IS 202 - FALL 2004 2004.10.21 - SLIDE 18 The Vocabulary Problem • How is it that we come to understand each other? – Shared context – Dialogue • How can machines come to understand what we say? – Shared context? – Dialogue? IS 202 - FALL 2004 2004.10.21 - SLIDE 19 Vocabulary Problem Solutions? • Furnas et al. – Make the user memorize precise system meanings – Have the user and system interact to identify the precise referent – Provide infinite aliases to objects • Minsky and Lenat – Give the system “commonsense” so it can understand what the user’s words can mean IS 202 - FALL 2004 2004.10.21 - SLIDE 20 Lenat on the Vocabulary Problem • “The important point is that users will be able to find information without having to be familiar with the precise way the information is stored, either through field names or by knowing which databases exist, and can be tapped.” IS 202 - FALL 2004 2004.10.21 - SLIDE 21 Minsky on the Vocabulary Problem • “To make our computers easier to use, we must make them more sensitive to our needs. That is, make them understand what we mean when we try to tell them what we want. […] If we want our computers to understand us, we’ll need to equip them with adequate knowledge.” IS 202 - FALL 2004 2004.10.21 - SLIDE 22 Agenda • Review of Last Time • Knowledge Representation – The Vocabulary Problem – Commonsense – CYC • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 23 Commonsense • Commonsense is background knowledge that enables us to understand, act, and communicate • Things that most children know • Minsky on commonsense: – “Much of our commonsense knowledge information has never been recorded at all because it has always seemed so obvious we never thought of describing it.” IS 202 - FALL 2004 2004.10.21 - SLIDE 24 Commonsense Example • “I want to get inexpensive dog food.” • • • • • The food is not made out of dogs. The food is not for me to eat. Dogs cannot buy their own food. I am not asking to be given dog food. I am not saying that I want to understand why some dog food is inexpensive. • The dog food is not more than $5 per can. IS 202 - FALL 2004 2004.10.21 - SLIDE 25 Engineering Commonsense • Use multiple ways to represent knowledge • Acquire huge amounts of that knowledge • Find commonsense ways to reason with it (“knowledge about how to think”) IS 202 - FALL 2004 2004.10.21 - SLIDE 26 Multiple Representations • Minksy – “I think this is what brains do instead: Find several ways to represent each problem and to represent the required knowledge. Then when one method fails to solve a problem, you can quickly switch to another description.” • Furnas – “But regardless of the number of commands or objects in a system and whatever the choice of their ‘official’ names, the designer must make many, many alternative verbal access routes to each.” IS 202 - FALL 2004 2004.10.21 - SLIDE 27 Agenda • Review of Last Time • Knowledge Representation – The Vocabulary Problem – Commonsense – CYC • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 28 CYC • Decades long effort to build a commonsense knowledge-base • Storied past • 100,000 basic concepts • 1,000,000 assertions about the world • The validity of Cyc’s assertions are context-dependent (default reasoning) IS 202 - FALL 2004 2004.10.21 - SLIDE 29 Cyc Examples • Cyc can find the match between a user's query for "pictures of strong, adventurous people" and an image whose caption reads simply "a man climbing a cliff" • Cyc can notice if an annual salary and an hourly salary are inadvertently being added together in a spreadsheet • Cyc can combine information from multiple databases to guess which physicians in practice together had been classmates in medical school • When someone searches for "Bolivia" on the Web, Cyc knows not to offer a follow-up question like "Where can I get free Bolivia online?" IS 202 - FALL 2004 2004.10.21 - SLIDE 30 Cyc Applications • Applications currently available or in development – – – – – Integration of Heterogeneous Databases Knowledge-Enhanced Retrieval of Captioned Information Guided Integration of Structured Terminology (GIST) Distributed AI WWW Information Retrieval • Potential applications – – – – – – – – Online brokering of goods and services "Smart" interfaces Intelligent character simulation for games Enhanced virtual reality Improved machine translation Improved speech recognition Sophisticated user modeling Semantic data mining IS 202 - FALL 2004 2004.10.21 - SLIDE 31 Cyc’s Top-Level Ontology • • • • • • • • • • • • • • • Fundamentals Top Level Time and Dates Types of Predicates Spatial Relations Quantities Mathematics Contexts Groups "Doing" Transformations Changes Of State Transfer Of Possession Movement Parts of Objects • • • • • • • • • • • • • Composition of Substances Agents Organizations Actors Roles Professions Emotion Propositional Attitudes Social Biology Chemistry Physiology General Medicine • • • • • • • • • • • • • • • Materials Waves Devices Construction Financial Food Clothing Weather Geography Transportation Information Perception Agreements Linguistic Terms Documentation http://www.cyc.com/cyc-2-1/toc.html IS 202 - FALL 2004 2004.10.21 - SLIDE 32 OpenCYC • Cyc’s knowledge-base is now coming online – http://www.opencyc.org/ • How could Cyc’s knowledge-base affect the design of information organization and retrieval systems? IS 202 - FALL 2004 2004.10.21 - SLIDE 33 Web KR Resources • OpenCYC – http://www.opencyc.org/ • OpenMind – http://commonsense.media.mit.edu • beingmeta – http://www.beingmeta.com/technology.fdxml • Semantic Web – http://www.w3.org/2001/sw/ IS 202 - FALL 2004 2004.10.21 - SLIDE 34 Agenda • Review of Last Time – The Vocabulary Problem – Commonsense – CYC • Knowledge Representation • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 35 Discussion Questions (Furnas) • Steve Chan on Furnas – The Furnas results indicating the problems of word selection would seem to be related to the motivations behind IR systems that support relevance feedback, as well as IR systems that support search term synonyms; namely, user's search terms may not clearly identify the desired objects. Of the two IR approaches, which one seems closer to the approach suggested by Furnas? IS 202 - FALL 2004 2004.10.21 - SLIDE 36 Discussion Questions (Furnas) • Steve Chan on Furnas – The Furnas experiments used only a small number of target objects, but allowed a large number of aliases. We saw in classical IR systems that search methods that worked well on small collections, would often have problems on larger collections. Do you believe the aliasing would work well for larger collections of target objects? What kinds of applications might you want to use unlimited aliasing for, and how do they differ from the typical IR document retrieval system? IS 202 - FALL 2004 2004.10.21 - SLIDE 37 Discussion Questions (Lenat) • Rupa Patel on Lenat – Can common-sense databases like CYC help solve Furnas's problem of vocabulary usage in systems design? – How can common-sense knowledge bases lend insight into natural language ambiguities? IS 202 - FALL 2004 2004.10.21 - SLIDE 38 Discussion Questions (Lenat) • Rupa Patel on Lenat – In CYC, human “knowledge enterers” are responsible for adding and editing atomic terms, assertions of reason, and contexts. The assertions can be related to one another, and each holds true only in certain contexts. – Based on your understanding of CYC, which categorization effects are utilized in the construction of the contexts: prototype effects, classical categorization theory, polysemy. IS 202 - FALL 2004 2004.10.21 - SLIDE 39 Discussion Questions (Minsky) • Andrew Fiore on Minsky – Minsky's claims about how the mind works are not supported by cognitive psychology. In what other useful ways might we view his theories? As philosophy? Merely as history? IS 202 - FALL 2004 2004.10.21 - SLIDE 40 Discussion Questions (Minsky) • Andrew Fiore on Minsky – Humans clearly use a great deal of commonsense information, and although upon demand we can express some of this knowledge in terms of rules, we do not move through the world logically applying one rule after another. (The cognitive burden would overwhelm.) Why, then, represent a commonsense knowledge base in terms of rules? IS 202 - FALL 2004 2004.10.21 - SLIDE 41 Discussion Questions (Minsky) • Andrew Fiore on Minsky – What are the benefits and deficits of this approach compared with a connectionist or associative model of the mind? (Efficiency, effectiveness, model legibility, external validity...) IS 202 - FALL 2004 2004.10.21 - SLIDE 42 Agenda • Review of Last Time • Knowledge Representation – The Vocabulary Problem – Commonsense – CYC • Discussion Questions • Action Items for Next Time IS 202 - FALL 2004 2004.10.21 - SLIDE 43 Assignment 0 Check-In • Suggested deliverables – SIMS email address – Focus statement – SIMS web site – SIMS coursework page IS 202 - FALL 2004 2004.10.21 - SLIDE 44 Next Time • Lexical Relations and WordNet (RRL) IS 202 - FALL 2004 2004.10.21 - SLIDE 45 Homework (!) • Course Reader – Word Association Norms, Mutual Information, and Lexicography (Church, Kenneth and Hanks, Patrick) – Wordnet: An Electronic Lexical Database -Introduction & Ch. 1 (C. Fellbaum, G.A. Miller) IS 202 - FALL 2004 2004.10.21 - SLIDE 46

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Knowledge