Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NLP And The Semantic Web Dainis Kiusals [email protected] COMS E6125 Spring 2010 Natural Language Processing • 1950s and 1960s – researchers began developing techniques aimed at understanding the ways computers could be used to provide Natural Language Processing. • The ability to capture context was studied by Noam Chomsky. His theory is based upon the use of Generative grammars - constructs used to describe how a sentence is formed, which may be used to create formal grammars through which an input stream of words may be parsed as a first step toward extracting their meaning [1] [sentence] [noun phrase], [verb phrase] [determiner], [noun], [verb], [article], [adjective], [adverb] NLP Issues / Challenges o Morphology – different forms of words(singular/plural, tense) o Syntax – grammatical structure(verbs, nouns) o Spelling – different spelling(and misspelling) of words o Text Segmentation – identifying word boundaries o Word Sense Disambiguation – multiple word meanings The company is ready to sell. color/colour, organize/organise bow (bend forward, weapon, ribbon, front of ship)? runs, ran, running Semantic Web • Proposed by Tim Berners-Lee (W3C Director) as a method for adding concepts via semantic annotation to Web content. • W3C standardizing the RDF and OWL protocols. • At lowest level, concepts stored as triples, defined at higher levels by ontologies. [3] [2] Keyword Search Queries are only processed as statistical analysis of keyword appearance in documents, with some advanced logical features. Does not distinguish between different interpretations of a word in given context in searched data (corpus) – search results might contain different uses of a word. [4] NLP / Semantic Search • Increased relevancy of results vs. keyword search. – Longer query phrases and questions yield better results. – Makes use of semantic information to attain better results. • Users need to change (used to keywords). – NLP Search pages need to encourage use of complex queries. • Web Search vs. Enterprise Search? – NLP Search may be better suited for smaller size domains. • Top-Down or Bottom-Up approach? – Top-Down approach relies more on NLP processing. – Creating the Semantic Web (Bottom-Up) will be more costly. NLP / Semantic Web Relationship • NLP and the Semantic Web compliment each other and will grow together. • As Semantic Web (RDL and OWL) annotation is added to Web pages, NLP search engines can take advantage of this information. • NLP processes can be used to automate the generation of content to be used to populate new Semantic Web annotation. • Global and domain-specific ontologies (which represent concepts and their relationships) combined with NLP techniques define the search process. Case Study: 1. Founded in San Francisco in 2005 with a goal to create a NLP Search Engine. 2. In 2007 obtained exclusive rights to several decades of Xerox/PARC NLP research. 3. Launched first public software beta in May 2008 – NLP search website covering approx. 2.5 million Wikipedia web pages (also referenced Freebase). 4. Created innovative user interface which leveraged NLP/semantic search results (ex: highlighting of relevant phrases/sentences within a larger document). 5. Two months after public beta was acquired by Microsoft in order to be incorporated into the Bing! Search engine. NLP Search Companies Resources 1. An Executive's Guide to Information Technology: Principles, Business Models and Terminology by Robert Plant and Stephen Murrell, Cambridge University Press, 2007 2. Enterprise 2.0 Implementation, Chapter 13 by Aaron C. Newman and Jeremy Thomas McGraw-Hill/Osborne, 2009 3. Encyclopedia of Knowledge Management, RDF and OWL by David G. Schwartz (ed) IGI Global, 2006 4. Semantic Knowledge Management: An Ontology-Based Framework by Antonio Zilli (ed) et al. IGI Global, 2009 5. http://www.parc.com/work/focus-area/NLP/ 6. http://www.powerset.com/ Resources/information taken from Full Paper submitted 3/12/10.