Download Sample Chapter

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Arabic grammar wikipedia , lookup

Sanskrit grammar wikipedia , lookup

Old Norse morphology wikipedia , lookup

Lexical semantics wikipedia , lookup

Navajo grammar wikipedia , lookup

Old Irish grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Ojibwe grammar wikipedia , lookup

English clause syntax wikipedia , lookup

Udmurt grammar wikipedia , lookup

Ukrainian grammar wikipedia , lookup

Inflection wikipedia , lookup

Esperanto grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Macedonian grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Spanish verbs wikipedia , lookup

French grammar wikipedia , lookup

Russian grammar wikipedia , lookup

Malay grammar wikipedia , lookup

Hungarian verbs wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Italian grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Turkish grammar wikipedia , lookup

English grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Spanish grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
CHAPTER
2
Basic English Concepts
This chapter discusses some of the basics of English language, which are relevant
for the understanding of language analysis. Every language defines certain basic
alphabets, words, word categories and language formation rules called grammar
rules. These categories are made according to their role in parts of speech. From
the language analysis point of view, the style of a language must be concretely
defined to design a working parser for that language. Though there is no hard
and fast rule to name the formal categories, but it is customary to give various
parts of speech their traditional names. The set of grammatical categories (like
noun, verb, etc.) which are taught in English literature are very informal and are
not precisely defined as formal grammar. In addition to this, there are many
more distinctions that have to be made in a real parser.
Hence, it is evident that for language processing using computer, the grammar
writer should very clearly understand the basic word categories of any language,
types of words and other constituents of the language and the process in which
they interact with each other.
In linguistic analysis, Chomsky has done pioneer work in 1960s. He has formally
defined various grammars, types of grammars, features and characteristics of
grammars. These are described in detail later in this chapter. As a result of
Chomsky’s work on transformational generative grammar, a vast amount of fairly
descriptive linguistic analysis is carried out, and as a result of it, a large repository
of terminology has grown up, which augments informal set of old fashioned
terms. Now let us describe elementary terminology of English grammar.
2.1 FUNDAMENTAL TERMINOLOGY OF ENGLISH GRAMMAR
The well-accepted English grammar terminology defines the following word categories:
(i) Noun: Traditionally, noun is considered a naming word. Formally, it is
defined as “the name of a person place or thing”. However, noun can also be
Basic English Concepts 25
Sometimes, an adverb modifies the quality of even the complete sentence
or phrase. For example, consider the following sentences:
4. Probably you are wrong. (modifies one complete sentence)
5. I will not read all through this book. (modifies a phrase)
(ix) Adjective: It is a word, which specifies quality of noun. It is a describing
word. It can be attached to a noun to modify its meaning or it can be used
to assert some attribute of the subject of sentence, e.g., blue, large, fake,
main, etc.
(x) Verb phrase: A verb along with its object constitutes a verb phrase, e.g.,
she gave flower to the teacher.
2.2 SENTENCE
A group of words which make a complete sense, is called a sentence. A sentence
is created by joining the words according to grammar rules, for example, Adwet
is a good boy.
The sentences are of four types.
(i) Assertive : Those which make statements or assertion; as
Humpty dumpty sat on a wall
(ii) Interrogative: Those which ask questions; as
Where do you live?
(iii) Imperative: A sentence that expresses a command or an entreaty
e.g., Be quiet.
(iv) Exclamatory sentence: A sentence that expresses a strong feeling is called
exclamatory sentence.
e.g., How cold the night is!
What a shame!
2.2.1 Parts of the Sentence
A sentence is divided into two parts, subject and predicate.
Subject is the part which names the person or thing we are speaking about.
The part which tells something about the subject is called predicate. Normally in
a sentence the subject comes before the predicate.
A sentence is made up of various constituents, these are known as parts of
speech. These constituents are made according to their work in sentence. These
parts of speech are:
(i) Noun
(ii) Adjective
(iii) Pronoun
(iv) Verb
(v) Adverb
(vi) Preposition
26
Natural Language Processing
(vii) Conjunction
(viii) Interjection
Basic terminology of English grammar is described as above. Now we discuss
some other details of these constituents.
Noun: The noun is of the following types:
(i) Common noun: It is a name given in common to every person or thing of
the same class or kind.
(ii) Proper noun: Is the name of a particular person, place or thing.
(iii) Collective noun: Is the name of a number (or collection) of persons or things
taken together and spoken of as one whole, e.g., crowd, mob, team, flock,
herd, army, etc.
(iv) Abstract noun: Is usually the name of a quality action or state considered
apart from the object to which it belongs as quality: goodness, kindness,
whiteness, etc.
(v) Countable nouns: Are the names of objects, that we can count, e.g., book,
pen.
(vi) Uncountable nouns: Are the names of things which we cannot count, e.g.,
milk, oil, sugar etc.
Adjective: A word used with a noun to describe or point out the person,
animal, place, thing with the noun names or to tell the number of quantity is
called adjective. The adjective can be of following types:
1. Adjectives of quality or descriptive adjective: It shows the kind or a quality
of a person or thing. For example, he is an honest man.
2. Adjective of quantity: It shows how much of a thing is meant. I ate some
rice.
3. Numeral adjective: Shows how many persons or things are meant, e.g.,
The hand has five fingers. Few cats like cold water.
4. Demonstrative adjective: It points out which person or thing is meant. As
this boy is stronger than Harry and those mangoes are sweet.
5. Interrogative adjective: As, what manner of man is he? Which way shall
we go?
6. Emphasizing adjective: The adjective used to emphasize some concept,
e.g., I saw it with my own eyes.
7. Exclamatory adjective: The words used to show exclamation, e.g., what a
genius! What folly! What an idea!
Adjectives can have degrees. The degrees mentioned quantity of the concept
indicated by adjective. There can be three degrees. Positive degree, comparative
degree, superlative degree. The positive degree is simple form of adjective. The
comparative degree is used to indicate comparison between the concepts. And
the superlative degree is highest degree of quality, e.g., strong, stronger, strongest.
Basic English Concepts 27
Article: The words a, an and the are called articles. They come before a noun.
A and an are indefinite articles because these usually leave indefinite the persons
or thing spoken of, as a doctor, an orange:
“The” is called definite article because it normally points to some particular
person or thing.
Pronoun: A word that is used instead of noun is called pronoun. The pronouns
can be of various types. Personal pronoun like, I, we, he, she, it, they, you. They
indicate the personal category. The persons can be of three types. 1st person, 2nd
person and 3rd person.
Verb: A word that tells or asserts something about a person or thing. For
example, Harry laughs, the clock strikes. The verbs can be of two types.
Types of verbs: Transitive and intransitive verbs. Transitive verb is a verb
which denotes an action which passes over from the subject to an object. The
intransitive verb is a verb which denotes an action which does not pass over
to an object or which expresses a state or being. For example, he ran a long
distance.
Most transitive verbs take a single object. But such transitive verbs as give,
ask, offer, promise, tell, etc. take two objects after them, an indirect object which
denoted the person to whom something is given or for whom something is done,
and a direct object which is usually the name of something, for example,
His father gave him (indirect) a watch (direct).
He told me (indirect) a secret (direct).
Most verbs can be used both as transitive and intransitive verbs. It is therefore,
better to say that a verb is used transitively or intransitively rather than that is
transitive or intransitive.
Some verbs, e.g., come, go, fall, die, sleep, lie, denote actions which cannot be
done to anything, they can therefore never be used transitively.
2.3 ACTIVE AND PASSIVE VOICE
Voice is the form of verb which shows whether whatever is denoted by the subject
does something or has something done on it. Active and passive are two methods
of framing an English sentence. They uses different types of verbs. In active voice
the verb form shows that the person or thing denoted by the subject does something
or we can say is doer of the action.
e.g., Ram helps Hari.
The active voice is so called because the person denoted by the subject
acts.
A verb is in passive voice when its form shows that something is denoted to
the person or thing denoted by the subject, e.g., Hari is helped by Ram.
The passive voice is so called because the person or thing denoted by the
subject is not active but is passive, that is, suffers or receives some action.
28
Natural Language Processing
Some sentences in active and passive form are given below:
(i) (a) Sita loves Savitri.
(b) Savitri is loved by Sita.
(ii) The mason is building a wall.
A wall is being built by the mason.
(iii) The peon opened the gate.
The gate was opened by the peon.
The sentences represented by active and passive voice convey the same semantic
meaning, hence, in the context of natural language processing there are grammars
(namely transformational grammars) which convert a sentence represented in
active voice to passive voice. It should be noted that when the verb is changed
from the active voice to the passive voice, the object of the transitive verb in the
active voice becomes the subject of the verb in the passive voice. When verbs that
take both direct and indirect objects in active voice are changed to passive voice,
either object may become a subject of the passive verb, while the other is retained.
An indirect object denotes the person to whom or for whom something is done,
while a direct object usually denotes a thing.
2.4 TENSES
Tense is the concept which indicates about ‘time’. In literature, there are three
demarcations done on timing template.
(i) The time which is presently going (or present).
(ii) The time which is before the present or the time which has passed (past).
(iii) The time which will come after the present or the time which has not yet
arrived, (future) to represent these three timing categories, language
incorporates the concept of ‘tenses’. The tense of a verb shows the time of
an action or an event. Corresponding to three categories there are three
tenses. These are present tense, past tense and future tense. In English
different verb categories represent these tenses. A verb that refers to present
time is said to be in present tense. A verb that refers to past time is said to
be in past tense, and a verb that refers to future time is said to be future
tense.
For example, see the following examples:
(i) I write this letter to please you.
(ii) I wrote the letter in his very presence.
(iii) I shall write another letter tomorrow.
While performing the language analysis these verb forms of tenses are utilized
to find the timing of the event. However, there are many variations of these verb
forms in English language. Sometime a past tense may refer to present time, and
a present tense may express a future time. For example,
Basic English Concepts 29
I wish, I knew the answer. (This sentence is equivalent to the saying that I am
sorry I don’t know the answer. It is past tense, present time).
Let’s wait till he comes (present tense – future degree)
Below we give the chief tenses (active voice, indicative mood) of the verb to
love.
Present tense
Singular number
Plural number
1st person
I love
We love
2nd person
You love
You love
3rd person
He loves
They love
1st person
2nd person
3rd person
Past tense
Singular number
I loved
You loved
He loved
Plural number
We loved
You loved
They loved
1st person
2nd person
3rd person
Future tense
Singular number
I shall/will love
You will love
He will love
Plural number
We shall/will love
You will love
They will love
In English language each tense is further divided into four categories, namely,
simple present, present continuous, present perfect, present perfect continuous.
See the following sentences:
1. I love
(Simple present)
2. I am loving
(Present continuous)
3. I have loved
(Present perfect)
4. I have been loving
(Present perfect continuous)
Verb in all of these sentences refers to the present time, and are therefore said to
be in the present tense. In sentence 1, however, the verb shows that action is
mentioned simply without anything being said about the completeness or
incompleteness about the action.
In sentence 2, the verb shows that action is mentioned as incomplete or continuous,
that is, it is still going on. In sentence 3, the verb shows that the action mentioned
as finished, complete or perfect, at the time of speaking.
The tense of verb in sentence 4 is said to be present perfect continuous because
the verb shows that the action is going on continuously and not completed at this
present moment.
30
Natural Language Processing
Thus, we see that the tense of a verb shows not only the time of an action or
event, but also the state of an action referred to.
Just as the present tense has four forms, the past tense has also following four
forms:
1. I loved
(Simple past)
2. I was loving
(Past continuous)
3. I had loved
(Past perfect)
4. I have been loving
(Past perfect continuous)
Similarly, the future tense has the following four forms:
1. I shall/will love
(Simple future)
2. I shall/will be loving
(Future continuous)
3. I shall/will have loved
(Future perfect)
4. I shall have been loving
(Future perfect continuous)
According to English sentence formation rules, a verb agrees with its subject
in number and person. There are different verb forms corresponding to different
number and person. This requirement of type matching corresponding to number
and person is utilized in language analysis to find out whether a sentence a
syntactically valid or not.
Besides the main verbs in English language, there are certain verbs which are
known as auxiliary verbs. The verbs be (am, is, was, etc. have and do, when used
with ordinary verbs to make tenses, passive forms, questions and negatives, are
called auxiliary verbs. The verbs can, could, may, might, will, would, shall, should,
must, and ought are called modal verbs. They are used before ordinary verbs and
express meaning such as permission, possibility, certainty and necessity. Need
and dare can sometimes be used like modal verbs.
2.4.1 Conjugation of the Verb
Any language has a well-defined syntax of lexicons. The conjugation of a verb
shows various forms it can assume either by inflection or by combination with
parts of other verbs, to mark voice, mood, tense, number, and person and to
those must be added its infinitives and participles.
Below is given the complete conjugation of verb ‘love’.
(i) Tenses
Simple present
Active
Passive
I love
I am loved
You love
You are loved
He loves
He is loved
They love
They are loved
Basic English Concepts 31
Present continuous
Active
I am loving
You are loving
He is loving
We are loving
They are loving
Passive
I am being loved
You are being loved
He is being loved
We are being loved
They are being loved
Present Perfect
Active
I have loved
You have loved
He has loved
They have loved
Passive
I have been loved
You have been loved
He has been loved
They have been loved
Present Perfect continuous
Active
I have been loving
You have been loving
We have been loving
They have been loving
Passive
——————
——————
——————
——————
Simple past
Active
I loved
You loved
He loved
They loved
Passive
I was loved
You were loved
He was loved
They were loved
Past continuous
Active
I was loving
You were loving
He was loving
They were loving
Passive
I was being loved
You were being loved
He was being loved
They were being loved
Past perfect
Active
I had loved
You had loved
He was loved
They had loved
Passive
I am loved
You are loved
He is loved
They had been loved
Basic English Concepts 33
(iii) Non-finites
Present infinitive
Continuous infinitive
Perfect participle
Present participle
Perfect participle
to love
to be loving
to have loved
loving
having loved
to be loved
——————
to have been loved
being loved
having been loved
2.5 ADVERB
Words which modify meaning of a verb, an adjective, or another adverb and tells
the quality of the verb are known as adverbs. e.g., quickly, very, and quite are
adverbs in the following sentences:
(i) Rama runs quickly.
(ii) This is very sweet mango.
(iii) Govind reads quite clearly.
Adverbs can be of the following types:
(i) Adverb of time: It indicates the time, (which shows when).
(ii) Adverb of frequency (which shows how often)
(iii) Adverb of place (which shows where)
(iv) Adverb of manner (which shows how or in what manner)
(v) Adverb of degree or quantity
(vi) Adverb of affirmation or negation
(vii) Adverb of reason
Besides these, there are many cue phrases like however, anyway which mark
the change of theme in the discourse. These have special significance in the
linguistic analysis. It is used to analyze the theme of discourse.
2.6 DICTIONARY FEATURES
We all know that dictionary is something that provides definition of words. From
computer storage viewpoint how definitions are stored in it differ in some sense.
This definition of word from the viewpoint of storage in computer database is
important for linguistic analysis and it is this definition we will describe in this
chapter.
The definition of word: It is defined as word.
(Category root related features)
The main objective of defining a word here is that they should provide
everything that might help in parsing and understanding the sentence. Obviously,
a sentence contains different parts of speech, so accordingly there comes a need
to categorize the words into categories like noun, pronoun, etc.
34
Natural Language Processing
In
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
general, words are categorized into the following categories:
Articles
Nouns
Pronouns
Verbs
Adverbs
Adjectives
Prepositions
Conjunctions
Numbers
Punctuation marks
Whwords
Let us discuss these categories in little bit detail from lexicon storage point of
view.
Articles
It contains only three words a, an, the. The dictionary definition of ART looks like:
A (ART A), AN (ART AN), THE (ART THE)
Nouns
These are classified as animate or inanimate. These are further classified into
singular and plural. The inanimate nouns are further classified into categories
like place, conveyance, time, objects, etc. and the animates are further classified
into male and female categories. Some examples of words are as follows:
RAM (NOUN RAM ANIMATE MALE SINGULAR)
BOY (NOUN ANIMATE MALE SINGULAR)
CAR (NOUN CAR CONVEYANCE SINGULAR)
RESTAURANT (NOUN RESTAURANT PLACE SINGULAR)
SUNRISE (NOUN SUNRISE TIME SINGULAR)
Pronouns
As such the pronouns have got maximum number of categories. First criterion for
classification is person, based on this classification the categories are, first person,
second person, and third person. Further criteria are number, gender and role.
Some examples of pronouns are as follows:
HE (PRONOUN HE THIRD PERSON MALE SINGULAR NOMINATIVE)
THEY (PRONOUN THEY THIRD PERSON MALE FEMALE NEUTER
PLURAL NOMINATIVE)
YOU (PRONOUN YOU SECOND PERSON MALE FEMALE NOMINATIVE
ACCUSATIVE SINGULAR PLURAL)
I (PRONOUN I FIRST PERSON MALE FEMALE SINGULAR NOMINATIVE)
36
Natural Language Processing
Numbers
Number can also appear in the sentence and a peculiar feature about them is
that they have got two representations, one in figures while other in words. An
example dictionary of entry of number words may be:
SIX (NUMBER SIX)
TWENTY (NUMBER TWENTY)
Structure of Dictionary
The dictionary should be structured so as to retrieve the definition as quickly as
possible, i.e., the search time should be reduced to minimum. One possible method
to reduce the search time is discussed below:
(i) Break up the whole dictionary according to the first alphabet of the word.
This way we will have 26 sublists of dictionary.
(ii) If we have just 1000 words in dictionary, there will be on an average 40
words per list requiring less time for searching.
Lexicon serves the purpose of providing tokens to the parser. The words along
with their definition remain stored in the dictionary. The dictionary specifies for
each word, its part of speech, any non-default value for its features, and
presumably something about its meaning. However, in English, as in all other
languages, individual words, often can be given different prefixes and suffixes,
for example, word “love” can appear in different guises, such as “loves”, “ loved”,
“loving”, “unloving”, etc. all of these words have one basic word and various
other derived forms. From computer storage point of view, it will be wasteful if
dictionary had to include all of these. The better approach would be to have the
lexicon use explicit knowledge of the structure of words (their morphology) and
have it figure out when a word is simply a variant of one that is already in the
dictionary. However, it needs to be mentioned that how to generate these
variations of words. A care must be taken in generating these patterns, e.g., an
error can be reported in the following:
“Kiss” —Æ “kis” + “s”
to some degree, such mistakes can be prevented by installing more stringent checks
on which endings are allowed in which circumstances. For example, a singular
noun ending in “s” will never form its plural by adding “s”, but rather by adding
“es”. It also helps to first ensure that a word is already not in the dictionary
before attempting to remove the “ endings” and the end product, after the lexicon
has removed all the supposed endings, is itself in the dictionary.
If such procedure is stored in the lexicon, then the dictionary can be made
reasonably compact, as the morphological unit will take care of standard
examples. Furthermore having default values for features will mean that for the
root words, like singular nouns, the dictionary need not even indicate that the
word is singular, since this is the default case. Such routines are called
38
Natural Language Processing
name of the concept, information about the deep case structure of the concept
and default values for those cases. For example, the case structure of the concept
like “drink” might include the cases: agent object and instrument. The cases are
meant to account for the fact that the concept ‘drink’ includes an agent who
performs the action’, ‘an object that is drunk’ and sometimes’ an instrument that
is used to ‘aid in drinking’. So, given the event description “Jatin drank a can of
beer”, the agent of the action is Jatin, the object is beer, and the instrument is
“can”.
The default values associated with case instruments are meant to be used as
tool for rejecting aberrant interpretations of text, whereas the text “Jatin has a
coke. He drank.” is interpreted to mean that “Jatin drank a coke”, the text “Jatin
drank a kite. He drank”, is not interpreted to mean “Jatin drank a kite”, since the
default value for the object of a drinking event is ‘liquid’ and a kite is not a type of
‘liquid’.
The case structure and the default and the default values associated with
each case are stored in a 3 tuple. These are sometimes called templates for the
event/state concept.
The structure of the template is:
[TEMPLATE event/state- concept-name list-of-default-values-pairs]
the template for the concept drink is :
[TEMPLATE drink ((obj liquid) (instr container)( agt animal1) …..)]
in principle, the dictionary contains the case structure for each of the concepts in
its dictionary, but in practice it is not necessary. Since currently NEXUS does not
use the dictionary to parse sentences, its only use for the case information is to
constrain the text interpretation process. Consequently, the default values are
added to the dictionary on the “need to” basis. The case relations used in NEXUS
are primarily derived from Simmons but also include pieces of the case systems of
Fillmore, etc.