Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Role of NLP in Linguistics 16-07-2010 Dipti Misra Sharma Language Technologies Research Centre International Institute of Information Technology Hyderabad India NLP and Linguistics • Have similar goals – Understanding human language(s) • NLP relies on the theoretical models provided by linguistics – Therefore, NLP definitely needs linguistics What about Linguistics ? Does it benefit from NLP ? NLP is useful • NLP tools can be useful for certain linguistic tasks such as – collecting, organizing, classifying data, – providing statistics etc.  This saves effort, brings forth facts which help in generalizations .... Makes life easier for linguists NLP and Linguistics Resources • NLP techniques are useful for creating linguistic resources such as – verb frames, transfer grammars, bilingual lexicons etc • Studies in CL have shown the usefulness of NLP techniques in historical linguistics as well (e.g. phylogenetic trees) Thus, NLP is useful not only for data related tasks but also for creation of linguistic resources What else ? • NLP researchers and linguists look at language from different perspectives • NLP researchers look for solutions which provide higher coverage – exceptions can be dealt with later • Linguistic researchers find exceptions more interesting – these help identify problem areas for the theory However Resource creation for NLP involves a close study of large scale real time data (e.g. linguistic annotation)  Close look at real time data often springs linguistic issues which have theoretical implications  Our experience Hindi has • A long list of lexical items • Historically derived from Sanskrit verb roots But • Are categorized as adjectives in Hindi For example, ‘sthita’ (situated), swiikrita (accepted), sviikaarya (acceptable), likhita (written), kathit (told) …… However These ‘adjectives’ of Hindi have modifiers which have argument like properties – both semantically and syntactically For example, dillii mein sthit qutub miinaar ek Delhi in situated Qutub Minar darshaniiy one worth-watching sthal hai place is Qutub Minar situated in Delhi is a place worth visiting unke dvaaraa kathit kahaaniyaan bahut pracalit hain Them by ` told stories very The stories told by them are very popular popular are The issue (1/2) • Both ‘dillii mein’ and ‘unke dvaaraa’ have appropriate case markers • ‘mein’ is locative and ‘dvaaraa’ agentive • These adjectives are historically non-finite verbs – However, Hindi grammars do not account for them so anymore – These are not morphologically decompositional either The issue (2/2) Morphological decomposition of sthit (situated) and kathit (told) would lead to a Sanskrit analysis and NOT a Hindi analysis  Hindi, for example, does not have ‘sthaa’ or ‘kath’ as verb roots  It doesn’t have ‘ita’ as an active participial suffix either.  How do we explain the argument like properties of their modifiers ? What does it indicate ? Linguists understand the relation but not through a linguistic process of Hindi  A linguistic process (or at least the roots and suffixes) from Sanskrit will have to be brought in  Is it that languages have elements which are at different stages of development/evolution ? Another example • Indian languages show frequent use of complex predicates Examples: pratiikshaa karnaa (wait do), kshamaa karnaa (forgive do) • The problem, When is an NV sequence a complex predicate and when it is not ? Complex Predicates The problem has long been discussed in linguistics literature   Several diagnostics have also been proposed However,  Quite a few NV sequences are a single unit semantically  Syntactically, they fail the diagnostics The question remains, Do we consider such cases as ‘complex verbs’ or as instances of ‘verb argument’ ? Conclusions • NLP tools and techniques can be useful for linguists • NLP throws up rich examples which need to be handled • Poses challenges for the theory