Download Presentation_Hao_Li - Programming Systems Lab

WordNet ® and its Java API  ♦ Introduction to WordNet  ♦ WordNet API for Java   Name: Hao Li Uni: hl2489 Introduction to WordNet ® 1.WordNet® is a large lexical database of English. It is kind of a dictionary. It is developed by Cognitive Science Laboratory of Priceton University. 2.In WordNet, Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. 3.In WordNet, Synsets are interlinked by means of conceptual-semantic and lexical relations. 4.WordNet is freely and publicly available for download and also have APIs for different programming languages. WordNet's structure makes it a useful tool for computational linguistics and natural language processing. WordNet API for JAVA(1)  Method Summary of Class WordNetDatabase ― abstract String[] getBaseFormCandidates(String inflection, SynsetType type) Returns lemma representing word forms that might be present in WordNet. √ static WordNetDatabase getFileInstance() Returns an implementation of this class that can access the WordNet database by searching files on the local file system. ― Synset[] getSynsets(String wordForm) Returns all synsets that contain the specified word form or a morphological variation of that word form. √ Synset[] getSynsets(String wordForm, SynsetType type) Returns only the synsets of a particular type (e.g., noun) that contain a word form or morphological variation of that form. ― abstract Synset[] getSynsets(String wordForm, SynsetType type, boolean useMorphology) Returns only the synsets of a particular type (e.g., noun) that contain a word form matching the specified text or one of that word form's variants. WordNet API for JAVA(2) • Method Summary of Calss Synset ― WordSense[] getAntonyms(String wordForm) Returns the antonyms (words with the opposite meaning), if any, associated with a word form in this synset. √ String getDefinition() Retrieve a short description / definition of this concept. ― WordSense[] getDerivationallyRelatedForms(String wordForm) Returns word forms that derivationally related to the one specified. √ int getTagCount(String wordForm) Returns a number that's intended to provide an approximation of how frequently the specified word form is used to represent this meaning relative to how often it's used to represent other meanings. ― SynsetType getType() Retrieve the type of synset this object represents. ― String[] getUsageExamples() Retrieve sentences showing examples of how this synset is used. √ String[] getWordForms() Retrieve the word forms. Method used in the project(1) WordNetDatabase.getSynsets(String wordForm, SynsetType type)  Take word “pig” as example: Synset[0]=Noun@2395406[hog,pig,grunter,squealer,Sus scrofa] - domestic swine Synset[1]=Noun@10612210[slob,sloven,pig,slovenly person] - a coarse obnoxious person Synset[2]=Noun@10179649[hog,pig] - a person regarded as greedy and pig-like Synset[3]=Noun@9879144[bull,cop,copper,fuzz,pig] - uncomplimentary terms for a policeman Synset[4]=Noun@3935116[pig bed,pig] - mold consisting of a bed of sand in which pig iron is cast Synset[5]=Noun@3934998[pig] - a crude block of metal (lead or iron) poured from a smelting furnace Method used in the project(2) Synset. getDefinition() Take Synset[0] of word “pig” as example: domestic swine Method used in the project(3) Synset.getTagCount(String wordForm)  It is a very useful method. It represent the frequency of the specified word used to represent this meaning relative to how often it's used to represent other meanings.  This method has two usage according to my understanding: (1)Analyse the same word of its different synets. (2) Analyse different words of the same synset. Analyse the same word of its different synets. Synset.getTagCount(String wordForm) The results shows us which meaning of the word is more frequently used. For example: The frequemcy of the word “bridge” in the following synset is 4. Synset[0]=Noun@2898711[bridge,span] - a structure that allows people or vehicles to cross an obstacle such as a river or canal or railway etc. And in another synset of “bridge” is 1. Synset[4]=Noun@490569[bridge] - any of various card games based on whist for four players The above example means when people talk about “bridge”, it is more likely about a structure “bridge ”than the card game “bridge”. Analyse different words of the same synset. Synset.getTagCount(String wordForm) The result shows us in order to express a definition, which word is more accurate and will not cause word sense ambiguation. For example: In a synset of the word “java” Synset=Noun@7929519[coffee,java] - a beverage consisting of an infusion of ground coffee beans The frequency of the word “coffee” is 46 and the word “java” is 1 . It means “coffee” is more representative in the meaning of “a beverage consisting of an infusion of ground coffee beans” than the word “java”. when people talks about “coffee”, you will understand they are talking about “a beverage consisting of an infusion of ground coffee beans” but not other meanings. And when people talks about “java”, they may talk about the beverage or the programming language “java”. Conclusion 1.There are two purpose of WordNet application: one is to produce a combination of dictionary and thesaurus that is more intuitively usable, and the other is to support automatic text analysis and artificial intelligence applications. 2.Because of its features, WordNet is now videly used in information systems, including word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and even automatic crossword puzzle generation. And it is also used in our project! I will tell you -------------what our WordNet based algorithm is in demo next week . Thank you!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Presentation_Hao_Li - Programming Systems Lab