Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linguisitics Levels of description Speech and language • Language as communication • Speech vs. text – Speech primary – Text is derived – Text is not “written speech” – Speech is not (usually) spoken text – Obviously they are related Levels of description • Smallest linguistic “unit” is the phoneme (speech) or (by analogy) the grapheme (text) • Phonemes combine to form words, or more exactly, morphemes • Morpheme: smallest meaningful unit of language • Words combine to form sentences (or utterances) according to the rules of syntax • Form is related to meaning via semantics • Pragmatics deals with how language use relates to the real world Phonetics • Study of speech sounds • Humans are the only species that have developed language – No dedicated speech organs as such • Not all sounds are speech sounds, even though they do convey meaning • Speech sounds combine in arbitrary ways to form words Phonetics • Articulatory phonetics concerned with how speech sounds are produced • Acoustic phonetics concerned with physical properties of speech signal • Auditory phonetics concerned with how speech sounds are perceived • All are of course related Possible speech sounds • • • • Range of sounds possible in human languages Consonants vs vowels Most consonants are pulmonic egressive Consonant sound is determined by place and manner of articulation, plus voicing, and some other features • Vowel sound is determined by tongue height and position (front/back) plus lip shape (round/spread) Phonemes • Huge number of possible distinctions, but not all are significant in any given language • Differences that are used to distinguish words are phonemic • Phoneme – group of (similar) sounds perceived by speakers as “the same” • Other differences between allophones • Phonemic distinction in one language may be allophonic in another • (-etic ~ -emic ~ allo- ~ -ology) Prosody • Besides individual speech sounds, other features of speech can carry meaning: – Length, volume, pitch – Intonation (pitch) • Can be syntactic or lexical (in some languages) – Stress (combination of all three) • Lexical or semantic/pragmatic Writing and text • Various writing systems worldwide • Most familiar is alphabetic – Ideally each letter represents a sound (phoneme) – Rarely 1:1 mapping • Phoneme can have different spellings • Individual letter can be different phoneme • Some phonemes represented by combination of letters (not always contiguous) • Other possibilities: consonantal, syllabic, ideological, and various combinations Graphemes • Latin alphabet has 26 letters • But English has ~50 phonemes • Phoneme can have different spellings – /s/ can be ‘s’, ‘c’, ‘sc’, ‘ss’, … • Individual letter can be different phoneme – ‘c’ can be /s/ or /k/ • Some phonemes represented by combination of letters – /θ/ ‘th’, /∫/ ‘sh’ Morphology • Smallest meaningful unit of language is the morpheme • Some words are single morphemes (meaning can’t be broken down), but many words have constituent parts • Words usually consist of a root plus affix(es), though some words can have multiple roots • Lexeme – abstract notion of group of word forms that belong together – lexeme ~ root ~ base form ~ dictionary (citation) form Role of morphology • Commonly made distinction: inflectional vs derivational • Inflectional morphology is grammatical – number, tense, case, gender • Derivational morphology concerns word building – part-of-speech derivation – words with related meaning Morphological processes • • • • • • Affixes: prefix, suffix, infix, circumfix Umlaut, ablaut Gemination, (partial) reduplication Root and pattern Stress (or tone) change Sandhi Language typology • Based on extent to which morphological processes play a role • Agglutinative – morphological affixes can be stacked up almost indefinitely – Implies that list of “possible words” is infinite • Synthetic – little or no affixation • Extent of morphology can interact with syntax: highly inflected languages often have freer word order Morphemes • Morphemes associated with meaning • (Like phonemes) not 1:1 • Single morpheme can have various allomorphs – Allomorphic variation usually conditioned, either intrinsically, or extrinsically (phonotactics, morphosyntax) – Can be “free variation” • Single form can represent different morphemes • Often rules of allomorphic variation are systematic Inflectional morphology • Grammatical in nature • Does not carry meaning, other than grammatical meaning • Highly systematic, though there may be irregularities and exceptions – Simplifies lexicon, only exceptions need to be listed – Unknown words may be guessable • Language-specific and sometimes idiosyncratic • (Mostly) helpful in parsing Derivational morphology • Lexical in nature • Can carry meaning • Fairly systematic, and predictable up to a point – Simplifies description of lexicon: regularly derived words need not be listed – Unknown words may be guessable • But … – Apparent derivations have specialised meaning – Some derivations missing • Languages often have parallel derivations which may be translatable Issues for NLP • Need scheme to handle morphology • Can involve ambiguity which must be solved in analysis • Can contribute to syntactic analysis – Morphological analysis identifies the lexeme plus grammatical information associated with inflections • And vice versa – Morphological ambiguity may be resolved by syntactic context • For many applications it is necessary to deal with just lexemes rather than word-forms and grammatical information: stemming Morphological processing • Stemming • String-handling approaches – Regular expressions – Mapping onto finite-state automata • 2-level morphology – Mapping between surface form and lexical representation • Related issues of what is in lexicon