Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Language Andrew Hippisley Computational Language Computational language and AI Language engineering: applied computational language Case study: spell checkers Computational language & AI Artificial Intelligence: “the simulation on computer of distinctly human mental functions.” Wilks (1993) Computational language & AI Language integral to intelligent systems Artificial Intelligence Turing Test ELIZA ELIZA Eliza: Hello. I am ELIZA. How can I help you? Me: I am feeling a bit depressed today Eliza: Do you think coming here will help you not to be depressed? Me: yes Eliza: You are sure? Me: a friend told me I should talk to you Eliza: Do you wish to you should talk to me? Me: I am willing to try Eliza: How long have you been willing to try? Me: I have been willing to try for about two weeks Eliza: Can you elaborate on that? Me: a friend said I should try to talk to you about my problem Eliza: Why do you say your problem? Me: I have a problem with depression Eliza: Can you elaborate on that? Me: no, I don't wish to discuss it Eliza: You are being a bit negative Computational language & AI Why language engineering? Language integral to intelligent systems Artifiicial Intelligence Turing Test ELIZA Expert systems: natural language interface, natural language database Computational language & AI Methods shared across systems Finite State Transition Networks (FSTN) Logic Formal rules Probability Data: you know it! Applied computational language History of the field Machine Translation: 1960, 1966, post 1966 Database access Text interpretation Information retrieval Text categorisation Language engineering Information overload Need a way of automatically processing text documents Information extraction Language engineering Information extraction GIDA: system for automatically monitoring financial market sentiment GIDA Language engineering Information overload Need a way of automatically processing text documents Information extraction Summarisation Automatic summarisation (courtesy of Paulo FERNANDES de OLIVEIRA, PhD) • Get information source; • Extract some content from it; • Present the most important part to the user xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx x xxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxx xx xxx xxxx xx xxx x xxxx x xx xxxx xx xxx xxxx xx x xxx xxx xxxx x xxx x xxx xx xx xxxxx x x xx xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx xxx xx xxxx x xxxxx xx xxxxx x Lexical Cohesion Links Example Sentence 15: "For the stock market this move was so deeply discounted that I don't think it will have a major impact". Sentence 42: Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30. Sentence 23: J&J's stock added 83 cents to $65.49. Sentence 26: Flagging stock markets kept merger activity and new stock offerings on the wane, the firm said. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002. Lexical Cohesion Bonds Example 17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of Compaq Computer -- a result unconfirmed by voting officials. 19. In a related vote, Compaq shareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002. Language engineering Information overload Need a way of automatically processing text documents Information extraction Summarisation Translation Retrieve only relevant documents Voice processing Language engineering Two main approaches Symbolic Stochastic Case study spell checkers Spelling dictionaries aim? given a sequence of symbols: 1. identify misspelled strings 2. generate a list of possible ‘candidate’ correct strings 3. select most probable candidate from the list Spelling dictionaries Implementation: Probabilistic framework bayesian rule noisy channel model Spelling dictionaries Types of spelling error actual word errors non-word errors Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors /graffe/ instead of /giraffe/ Spelling dictionaries Types of spelling error actual word errors non-word errors /piece/ instead of /peace/ /there/ instead of /their/ /graffe/ instead of /giraffe/ of all errors in type written texts, 80% are nonword errors Spelling dictionaries non-word errors Cognitive errors /seperate/ instead of /separate/ phonetically equivalent sequence of symbols has been substituted due to lack of knowledge about spelling conventions Spelling dictionaries non-word errors Cognitive errors Typographic (‘typo’) errors influenced by keyboard e.g. substitution of /w/ for /e/ due to its adjacency on the keyboard /thw/ instead of /the/ Spelling dictionaries non-word errors noisy channel model The actual word has been passed through a noisy communication channel This has distorted the word, thereby changing it in some way The misspelled word is the distorted version of the actual word Aim: recover the actual word by hypothesising about the possible ways in which it could have been distorted Spelling dictionaries non-word errors noisy channel model What are the possible distortions? insertion deletion substitution transposition all of these viewed as transformations that take place in the noisy channel Spelling dictionaries Implementing spelling identification and correction algorithm Spelling dictionaries Implementing spelling identification and correction algorithm STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary STAGE 3: assign probability values to each candidate in the list STAGE 4: select best candidate Spelling dictionaries STAGE 3 prior probability likelihood given all the words in English, is this candidate more likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? P(t/c), calculated using a corpus of errors, or transformations Bayesian rule: get the product of the prior probability and the likelihood P(c) X P(t/c) Spelling dictionaries non-word errors Implementing spelling identification and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement: noisy channel model Bayesian Rule Next week Finite state machines and regular expressions