* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sample Chapter
Arabic grammar wikipedia , lookup
Sanskrit grammar wikipedia , lookup
Old Norse morphology wikipedia , lookup
Lexical semantics wikipedia , lookup
Navajo grammar wikipedia , lookup
Old Irish grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Ojibwe grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Udmurt grammar wikipedia , lookup
Ukrainian grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Macedonian grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Swedish grammar wikipedia , lookup
Spanish verbs wikipedia , lookup
French grammar wikipedia , lookup
Russian grammar wikipedia , lookup
Malay grammar wikipedia , lookup
Hungarian verbs wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Italian grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Turkish grammar wikipedia , lookup
English grammar wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Spanish grammar wikipedia , lookup
CHAPTER 2 Basic English Concepts This chapter discusses some of the basics of English language, which are relevant for the understanding of language analysis. Every language defines certain basic alphabets, words, word categories and language formation rules called grammar rules. These categories are made according to their role in parts of speech. From the language analysis point of view, the style of a language must be concretely defined to design a working parser for that language. Though there is no hard and fast rule to name the formal categories, but it is customary to give various parts of speech their traditional names. The set of grammatical categories (like noun, verb, etc.) which are taught in English literature are very informal and are not precisely defined as formal grammar. In addition to this, there are many more distinctions that have to be made in a real parser. Hence, it is evident that for language processing using computer, the grammar writer should very clearly understand the basic word categories of any language, types of words and other constituents of the language and the process in which they interact with each other. In linguistic analysis, Chomsky has done pioneer work in 1960s. He has formally defined various grammars, types of grammars, features and characteristics of grammars. These are described in detail later in this chapter. As a result of Chomsky’s work on transformational generative grammar, a vast amount of fairly descriptive linguistic analysis is carried out, and as a result of it, a large repository of terminology has grown up, which augments informal set of old fashioned terms. Now let us describe elementary terminology of English grammar. 2.1 FUNDAMENTAL TERMINOLOGY OF ENGLISH GRAMMAR The well-accepted English grammar terminology defines the following word categories: (i) Noun: Traditionally, noun is considered a naming word. Formally, it is defined as “the name of a person place or thing”. However, noun can also be Basic English Concepts 25 Sometimes, an adverb modifies the quality of even the complete sentence or phrase. For example, consider the following sentences: 4. Probably you are wrong. (modifies one complete sentence) 5. I will not read all through this book. (modifies a phrase) (ix) Adjective: It is a word, which specifies quality of noun. It is a describing word. It can be attached to a noun to modify its meaning or it can be used to assert some attribute of the subject of sentence, e.g., blue, large, fake, main, etc. (x) Verb phrase: A verb along with its object constitutes a verb phrase, e.g., she gave flower to the teacher. 2.2 SENTENCE A group of words which make a complete sense, is called a sentence. A sentence is created by joining the words according to grammar rules, for example, Adwet is a good boy. The sentences are of four types. (i) Assertive : Those which make statements or assertion; as Humpty dumpty sat on a wall (ii) Interrogative: Those which ask questions; as Where do you live? (iii) Imperative: A sentence that expresses a command or an entreaty e.g., Be quiet. (iv) Exclamatory sentence: A sentence that expresses a strong feeling is called exclamatory sentence. e.g., How cold the night is! What a shame! 2.2.1 Parts of the Sentence A sentence is divided into two parts, subject and predicate. Subject is the part which names the person or thing we are speaking about. The part which tells something about the subject is called predicate. Normally in a sentence the subject comes before the predicate. A sentence is made up of various constituents, these are known as parts of speech. These constituents are made according to their work in sentence. These parts of speech are: (i) Noun (ii) Adjective (iii) Pronoun (iv) Verb (v) Adverb (vi) Preposition 26 Natural Language Processing (vii) Conjunction (viii) Interjection Basic terminology of English grammar is described as above. Now we discuss some other details of these constituents. Noun: The noun is of the following types: (i) Common noun: It is a name given in common to every person or thing of the same class or kind. (ii) Proper noun: Is the name of a particular person, place or thing. (iii) Collective noun: Is the name of a number (or collection) of persons or things taken together and spoken of as one whole, e.g., crowd, mob, team, flock, herd, army, etc. (iv) Abstract noun: Is usually the name of a quality action or state considered apart from the object to which it belongs as quality: goodness, kindness, whiteness, etc. (v) Countable nouns: Are the names of objects, that we can count, e.g., book, pen. (vi) Uncountable nouns: Are the names of things which we cannot count, e.g., milk, oil, sugar etc. Adjective: A word used with a noun to describe or point out the person, animal, place, thing with the noun names or to tell the number of quantity is called adjective. The adjective can be of following types: 1. Adjectives of quality or descriptive adjective: It shows the kind or a quality of a person or thing. For example, he is an honest man. 2. Adjective of quantity: It shows how much of a thing is meant. I ate some rice. 3. Numeral adjective: Shows how many persons or things are meant, e.g., The hand has five fingers. Few cats like cold water. 4. Demonstrative adjective: It points out which person or thing is meant. As this boy is stronger than Harry and those mangoes are sweet. 5. Interrogative adjective: As, what manner of man is he? Which way shall we go? 6. Emphasizing adjective: The adjective used to emphasize some concept, e.g., I saw it with my own eyes. 7. Exclamatory adjective: The words used to show exclamation, e.g., what a genius! What folly! What an idea! Adjectives can have degrees. The degrees mentioned quantity of the concept indicated by adjective. There can be three degrees. Positive degree, comparative degree, superlative degree. The positive degree is simple form of adjective. The comparative degree is used to indicate comparison between the concepts. And the superlative degree is highest degree of quality, e.g., strong, stronger, strongest. Basic English Concepts 27 Article: The words a, an and the are called articles. They come before a noun. A and an are indefinite articles because these usually leave indefinite the persons or thing spoken of, as a doctor, an orange: “The” is called definite article because it normally points to some particular person or thing. Pronoun: A word that is used instead of noun is called pronoun. The pronouns can be of various types. Personal pronoun like, I, we, he, she, it, they, you. They indicate the personal category. The persons can be of three types. 1st person, 2nd person and 3rd person. Verb: A word that tells or asserts something about a person or thing. For example, Harry laughs, the clock strikes. The verbs can be of two types. Types of verbs: Transitive and intransitive verbs. Transitive verb is a verb which denotes an action which passes over from the subject to an object. The intransitive verb is a verb which denotes an action which does not pass over to an object or which expresses a state or being. For example, he ran a long distance. Most transitive verbs take a single object. But such transitive verbs as give, ask, offer, promise, tell, etc. take two objects after them, an indirect object which denoted the person to whom something is given or for whom something is done, and a direct object which is usually the name of something, for example, His father gave him (indirect) a watch (direct). He told me (indirect) a secret (direct). Most verbs can be used both as transitive and intransitive verbs. It is therefore, better to say that a verb is used transitively or intransitively rather than that is transitive or intransitive. Some verbs, e.g., come, go, fall, die, sleep, lie, denote actions which cannot be done to anything, they can therefore never be used transitively. 2.3 ACTIVE AND PASSIVE VOICE Voice is the form of verb which shows whether whatever is denoted by the subject does something or has something done on it. Active and passive are two methods of framing an English sentence. They uses different types of verbs. In active voice the verb form shows that the person or thing denoted by the subject does something or we can say is doer of the action. e.g., Ram helps Hari. The active voice is so called because the person denoted by the subject acts. A verb is in passive voice when its form shows that something is denoted to the person or thing denoted by the subject, e.g., Hari is helped by Ram. The passive voice is so called because the person or thing denoted by the subject is not active but is passive, that is, suffers or receives some action. 28 Natural Language Processing Some sentences in active and passive form are given below: (i) (a) Sita loves Savitri. (b) Savitri is loved by Sita. (ii) The mason is building a wall. A wall is being built by the mason. (iii) The peon opened the gate. The gate was opened by the peon. The sentences represented by active and passive voice convey the same semantic meaning, hence, in the context of natural language processing there are grammars (namely transformational grammars) which convert a sentence represented in active voice to passive voice. It should be noted that when the verb is changed from the active voice to the passive voice, the object of the transitive verb in the active voice becomes the subject of the verb in the passive voice. When verbs that take both direct and indirect objects in active voice are changed to passive voice, either object may become a subject of the passive verb, while the other is retained. An indirect object denotes the person to whom or for whom something is done, while a direct object usually denotes a thing. 2.4 TENSES Tense is the concept which indicates about ‘time’. In literature, there are three demarcations done on timing template. (i) The time which is presently going (or present). (ii) The time which is before the present or the time which has passed (past). (iii) The time which will come after the present or the time which has not yet arrived, (future) to represent these three timing categories, language incorporates the concept of ‘tenses’. The tense of a verb shows the time of an action or an event. Corresponding to three categories there are three tenses. These are present tense, past tense and future tense. In English different verb categories represent these tenses. A verb that refers to present time is said to be in present tense. A verb that refers to past time is said to be in past tense, and a verb that refers to future time is said to be future tense. For example, see the following examples: (i) I write this letter to please you. (ii) I wrote the letter in his very presence. (iii) I shall write another letter tomorrow. While performing the language analysis these verb forms of tenses are utilized to find the timing of the event. However, there are many variations of these verb forms in English language. Sometime a past tense may refer to present time, and a present tense may express a future time. For example, Basic English Concepts 29 I wish, I knew the answer. (This sentence is equivalent to the saying that I am sorry I don’t know the answer. It is past tense, present time). Let’s wait till he comes (present tense – future degree) Below we give the chief tenses (active voice, indicative mood) of the verb to love. Present tense Singular number Plural number 1st person I love We love 2nd person You love You love 3rd person He loves They love 1st person 2nd person 3rd person Past tense Singular number I loved You loved He loved Plural number We loved You loved They loved 1st person 2nd person 3rd person Future tense Singular number I shall/will love You will love He will love Plural number We shall/will love You will love They will love In English language each tense is further divided into four categories, namely, simple present, present continuous, present perfect, present perfect continuous. See the following sentences: 1. I love (Simple present) 2. I am loving (Present continuous) 3. I have loved (Present perfect) 4. I have been loving (Present perfect continuous) Verb in all of these sentences refers to the present time, and are therefore said to be in the present tense. In sentence 1, however, the verb shows that action is mentioned simply without anything being said about the completeness or incompleteness about the action. In sentence 2, the verb shows that action is mentioned as incomplete or continuous, that is, it is still going on. In sentence 3, the verb shows that the action mentioned as finished, complete or perfect, at the time of speaking. The tense of verb in sentence 4 is said to be present perfect continuous because the verb shows that the action is going on continuously and not completed at this present moment. 30 Natural Language Processing Thus, we see that the tense of a verb shows not only the time of an action or event, but also the state of an action referred to. Just as the present tense has four forms, the past tense has also following four forms: 1. I loved (Simple past) 2. I was loving (Past continuous) 3. I had loved (Past perfect) 4. I have been loving (Past perfect continuous) Similarly, the future tense has the following four forms: 1. I shall/will love (Simple future) 2. I shall/will be loving (Future continuous) 3. I shall/will have loved (Future perfect) 4. I shall have been loving (Future perfect continuous) According to English sentence formation rules, a verb agrees with its subject in number and person. There are different verb forms corresponding to different number and person. This requirement of type matching corresponding to number and person is utilized in language analysis to find out whether a sentence a syntactically valid or not. Besides the main verbs in English language, there are certain verbs which are known as auxiliary verbs. The verbs be (am, is, was, etc. have and do, when used with ordinary verbs to make tenses, passive forms, questions and negatives, are called auxiliary verbs. The verbs can, could, may, might, will, would, shall, should, must, and ought are called modal verbs. They are used before ordinary verbs and express meaning such as permission, possibility, certainty and necessity. Need and dare can sometimes be used like modal verbs. 2.4.1 Conjugation of the Verb Any language has a well-defined syntax of lexicons. The conjugation of a verb shows various forms it can assume either by inflection or by combination with parts of other verbs, to mark voice, mood, tense, number, and person and to those must be added its infinitives and participles. Below is given the complete conjugation of verb ‘love’. (i) Tenses Simple present Active Passive I love I am loved You love You are loved He loves He is loved They love They are loved Basic English Concepts 31 Present continuous Active I am loving You are loving He is loving We are loving They are loving Passive I am being loved You are being loved He is being loved We are being loved They are being loved Present Perfect Active I have loved You have loved He has loved They have loved Passive I have been loved You have been loved He has been loved They have been loved Present Perfect continuous Active I have been loving You have been loving We have been loving They have been loving Passive —————— —————— —————— —————— Simple past Active I loved You loved He loved They loved Passive I was loved You were loved He was loved They were loved Past continuous Active I was loving You were loving He was loving They were loving Passive I was being loved You were being loved He was being loved They were being loved Past perfect Active I had loved You had loved He was loved They had loved Passive I am loved You are loved He is loved They had been loved Basic English Concepts 33 (iii) Non-finites Present infinitive Continuous infinitive Perfect participle Present participle Perfect participle to love to be loving to have loved loving having loved to be loved —————— to have been loved being loved having been loved 2.5 ADVERB Words which modify meaning of a verb, an adjective, or another adverb and tells the quality of the verb are known as adverbs. e.g., quickly, very, and quite are adverbs in the following sentences: (i) Rama runs quickly. (ii) This is very sweet mango. (iii) Govind reads quite clearly. Adverbs can be of the following types: (i) Adverb of time: It indicates the time, (which shows when). (ii) Adverb of frequency (which shows how often) (iii) Adverb of place (which shows where) (iv) Adverb of manner (which shows how or in what manner) (v) Adverb of degree or quantity (vi) Adverb of affirmation or negation (vii) Adverb of reason Besides these, there are many cue phrases like however, anyway which mark the change of theme in the discourse. These have special significance in the linguistic analysis. It is used to analyze the theme of discourse. 2.6 DICTIONARY FEATURES We all know that dictionary is something that provides definition of words. From computer storage viewpoint how definitions are stored in it differ in some sense. This definition of word from the viewpoint of storage in computer database is important for linguistic analysis and it is this definition we will describe in this chapter. The definition of word: It is defined as word. (Category root related features) The main objective of defining a word here is that they should provide everything that might help in parsing and understanding the sentence. Obviously, a sentence contains different parts of speech, so accordingly there comes a need to categorize the words into categories like noun, pronoun, etc. 34 Natural Language Processing In (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) general, words are categorized into the following categories: Articles Nouns Pronouns Verbs Adverbs Adjectives Prepositions Conjunctions Numbers Punctuation marks Whwords Let us discuss these categories in little bit detail from lexicon storage point of view. Articles It contains only three words a, an, the. The dictionary definition of ART looks like: A (ART A), AN (ART AN), THE (ART THE) Nouns These are classified as animate or inanimate. These are further classified into singular and plural. The inanimate nouns are further classified into categories like place, conveyance, time, objects, etc. and the animates are further classified into male and female categories. Some examples of words are as follows: RAM (NOUN RAM ANIMATE MALE SINGULAR) BOY (NOUN ANIMATE MALE SINGULAR) CAR (NOUN CAR CONVEYANCE SINGULAR) RESTAURANT (NOUN RESTAURANT PLACE SINGULAR) SUNRISE (NOUN SUNRISE TIME SINGULAR) Pronouns As such the pronouns have got maximum number of categories. First criterion for classification is person, based on this classification the categories are, first person, second person, and third person. Further criteria are number, gender and role. Some examples of pronouns are as follows: HE (PRONOUN HE THIRD PERSON MALE SINGULAR NOMINATIVE) THEY (PRONOUN THEY THIRD PERSON MALE FEMALE NEUTER PLURAL NOMINATIVE) YOU (PRONOUN YOU SECOND PERSON MALE FEMALE NOMINATIVE ACCUSATIVE SINGULAR PLURAL) I (PRONOUN I FIRST PERSON MALE FEMALE SINGULAR NOMINATIVE) 36 Natural Language Processing Numbers Number can also appear in the sentence and a peculiar feature about them is that they have got two representations, one in figures while other in words. An example dictionary of entry of number words may be: SIX (NUMBER SIX) TWENTY (NUMBER TWENTY) Structure of Dictionary The dictionary should be structured so as to retrieve the definition as quickly as possible, i.e., the search time should be reduced to minimum. One possible method to reduce the search time is discussed below: (i) Break up the whole dictionary according to the first alphabet of the word. This way we will have 26 sublists of dictionary. (ii) If we have just 1000 words in dictionary, there will be on an average 40 words per list requiring less time for searching. Lexicon serves the purpose of providing tokens to the parser. The words along with their definition remain stored in the dictionary. The dictionary specifies for each word, its part of speech, any non-default value for its features, and presumably something about its meaning. However, in English, as in all other languages, individual words, often can be given different prefixes and suffixes, for example, word “love” can appear in different guises, such as “loves”, “ loved”, “loving”, “unloving”, etc. all of these words have one basic word and various other derived forms. From computer storage point of view, it will be wasteful if dictionary had to include all of these. The better approach would be to have the lexicon use explicit knowledge of the structure of words (their morphology) and have it figure out when a word is simply a variant of one that is already in the dictionary. However, it needs to be mentioned that how to generate these variations of words. A care must be taken in generating these patterns, e.g., an error can be reported in the following: “Kiss” —Æ “kis” + “s” to some degree, such mistakes can be prevented by installing more stringent checks on which endings are allowed in which circumstances. For example, a singular noun ending in “s” will never form its plural by adding “s”, but rather by adding “es”. It also helps to first ensure that a word is already not in the dictionary before attempting to remove the “ endings” and the end product, after the lexicon has removed all the supposed endings, is itself in the dictionary. If such procedure is stored in the lexicon, then the dictionary can be made reasonably compact, as the morphological unit will take care of standard examples. Furthermore having default values for features will mean that for the root words, like singular nouns, the dictionary need not even indicate that the word is singular, since this is the default case. Such routines are called 38 Natural Language Processing name of the concept, information about the deep case structure of the concept and default values for those cases. For example, the case structure of the concept like “drink” might include the cases: agent object and instrument. The cases are meant to account for the fact that the concept ‘drink’ includes an agent who performs the action’, ‘an object that is drunk’ and sometimes’ an instrument that is used to ‘aid in drinking’. So, given the event description “Jatin drank a can of beer”, the agent of the action is Jatin, the object is beer, and the instrument is “can”. The default values associated with case instruments are meant to be used as tool for rejecting aberrant interpretations of text, whereas the text “Jatin has a coke. He drank.” is interpreted to mean that “Jatin drank a coke”, the text “Jatin drank a kite. He drank”, is not interpreted to mean “Jatin drank a kite”, since the default value for the object of a drinking event is ‘liquid’ and a kite is not a type of ‘liquid’. The case structure and the default and the default values associated with each case are stored in a 3 tuple. These are sometimes called templates for the event/state concept. The structure of the template is: [TEMPLATE event/state- concept-name list-of-default-values-pairs] the template for the concept drink is : [TEMPLATE drink ((obj liquid) (instr container)( agt animal1) …..)] in principle, the dictionary contains the case structure for each of the concepts in its dictionary, but in practice it is not necessary. Since currently NEXUS does not use the dictionary to parse sentences, its only use for the case information is to constrain the text interpretation process. Consequently, the default values are added to the dictionary on the “need to” basis. The case relations used in NEXUS are primarily derived from Simmons but also include pieces of the case systems of Fillmore, etc.