* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Morphological Analyzers
Proto-Indo-European verbs wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Sanskrit grammar wikipedia , lookup
Untranslatability wikipedia , lookup
Kannada grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
Japanese grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Germanic strong verb wikipedia , lookup
Germanic weak verb wikipedia , lookup
Agglutination wikipedia , lookup
Macedonian grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Ojibwe grammar wikipedia , lookup
Old Irish grammar wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Sotho parts of speech wikipedia , lookup
Modern Greek grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Italian grammar wikipedia , lookup
French grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Old Norse morphology wikipedia , lookup
Russian grammar wikipedia , lookup
Ukrainian grammar wikipedia , lookup
Icelandic grammar wikipedia , lookup
Swedish grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Polish grammar wikipedia , lookup
Malay grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Paradigm based Morphological Analyzers Dr. Radhika Mamidi Morphological Analyzers They are tools to automatically decompose a word into its root and affixes and give related features. Example: 1st stage – identifying morphemes ate: root = eat suffix = ed 2nd stage – analyzing morphemes ate: root = eat tense = past Some Applications • Machine Translation • Speech Processing Machine Translation • Pos tagger gives only part of speech. More information is needed to translate a word correctly. • More information like tense, aspect and mood of the verbs, gender, number and person of the nouns. Example: [Eng Hindi translation] ENGLISH: She went home. HINDI: vaha ghar gayi. ENGLISH: He went home. HINDI: vaha ghar gayaa. • The gender of the pronoun is essential for the translation in Hindi. • The morph analyzer will give the gender information. Example: [Hindi Eng translation] In Hindi ‘vaha’ can have different senses – ‘he’, ‘she’ or ‘that’. “vaha ghar gayaa” If we were to translate this, then the extra information on the verb will help us to translate the above sentence correctly as “He went home” • The ‘yaa’ indicates past tense as well as singular number and masculine gender. • The morph analyzer will give this information. Speech Processing • In Text to Speech tools also Morph Analyzer is essential along with Part of Speech. • With extra information on the words, the efficiency increases. • The intonation, the pause, the stress etc can be close to the way humans speak. • This additional information is given by morph analyzers. Approaches • Paradigm based • Finite State based We will discuss the first approach. Requirement for building paradigm based Morph Analyzers • • • • Knowledge of Lexeme and Word forms Root and Affix dictionaries Paradigm Table Paradigm Class • The lexemes are stored in the dictionaries and the word forms as paradigms. Lexeme and Word form APPLE: apple, apples CHURCH: church, churches BOY: boy, boys WATCH: watch, watches SPY: spy, spies • The word in upper case is called LEXEME and the inflected forms are WORD FORMS. • Lexemes are the headwords in a dictionary. Lexeme and Word form Another example: played is a word form of the lexeme PLAY plays is a word form of the lexeme PLAY(1) plays is a word form of the lexeme PLAY(2) where PLAY(1) is a verb and PLAY(2) is a noun. PLAY(1) and PLAY(2) are two different lexemes. Exercise 1 Give the lexeme of the following word forms: ate played manufactured glasses players bites Exercise 2 “manufactured” can be a verb in past tense or an adjective. So it belongs to two different lexemes – MANUFACTURE and MANUFACTURED. Which of the following words belong to more than one lexeme? ate wanted wrote written finished Root and Affix dictionaries Root dictionary contains a list of roots or the base forms - the lexemes. It is stored usually with its part of speech. Affix dictionary contains a list of all the affixes in a language. The features of the affixes are stored here. The features are stored as attribute value pairs. Example entries in a dictionary Root dictionary eat <root=‘eat’, category=‘verb’> book <root=‘book’, category=‘verb’> book <root=‘book’, category=‘noun’> Affix dictionary +s <tense = ‘present’> +ed <tense = ‘past’> +en <aspect = ‘perfective’> +ing <aspect = ‘progressive’> Paradigm table A paradigm table represents the inflected forms of a particular lexeme. It includes the conjugation of verbs and declensions of nouns, adjectives, pronouns etc. Example: APPLE: apple, apples EAT: eat, eats, ate, eaten, eating SMART: smart, smarter, smartest Conjugation of English verbs • • • • • play plays played played playing eat eats ate eaten eating look looks looked looked looking dance dances danced danced dancing push pushes pushed pushed pushing Declension of English nouns • • • • • apple, apples boy, boys church, churches watch, watches spy, spies Declension of English adjectives • smart, smarter, smartest • tall, taller, tallest Exercise 3 • Give the paradigm table for 5 different nouns and 5 different verbs in English. Paradigm Class • A paradigm class contains the classes of lexemes i.e. the prototypical root and all the roots that fall in its class including the given root. • Those words which decline or conjugate in exactly the same way, fall into one paradigm class. The English verbs ‘PLAY’ and ‘LOOK’ have the following paradigm: • play plays played played playing • look looks looked looked looking So they belong to the same class. But ‘PUSH’ since it differs in its present tense form i.e. it has ‘-es’ and not ‘- s’ falls in another class. Its paradigm is as follows: • push pushes pushed pushed pushing The English nouns ‘PLAY’ and ‘BOY’ have the following paradigm: • play plays • boy boys So they belong to the same class. But ‘SPY’ falls in another class. Its paradigm is as follows: • spy spies Paradigm class is represented by one member of the class. eat V play V push V play N spy N church N eat play, talk, walk, train push, fish play, boy, day spy, sky church, watch Exercise 4 Which of the following verbs belong to the same paradigm class? mince ride walk speak shake play dance take Which of the following nouns belong to the same paradigm class? girl house dish book mouse beach flower pencil