Download A Prototype Syntax Checker for German Learners of English

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Agglutination wikipedia , lookup

Pipil grammar wikipedia , lookup

Untranslatability wikipedia , lookup

Stemming wikipedia , lookup

Malay grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Morphology (linguistics) wikipedia , lookup

Transformational grammar wikipedia , lookup

Pleonasm wikipedia , lookup

Junction Grammar wikipedia , lookup

Parsing wikipedia , lookup

Transcript
A Prototype Syntax Checker
for German Learners of English
Paper presented
at
Intelligent Computer Aided Language Learning
ICALL ’91
UMIST - September 1991
Deirdre Mulligan
Cambridge Technology Partners
Cambridge Mass.
U.S.A.
&
Kevin Ryan
Dept. of Computer Science & Information Systems
University of Limerick
IRELAND
e-mail : [email protected]
NOTE
This paper has not been previously submitted for
publication to any other conference or journal
Mulligan & Ryan
ICALL’91
1. Introduction
This paper describes a prototype system that will act as a "style-checker" for written English of native German
speakers. It is a German-specific system in that it detects errors, "typically" made by Germans, which occur due to
the influence of their mother-tongue on their English. The "style-checker" also searches for errors that all learners of
English are prone to make, irrespective of their mother-tongue.
The motivations for such this work are many. An increasing number of German speaking people are studying
English and so must submit essays and assignment for various courses. One of the major problems is that they write
"German-English". A "style-checker" which could detect "German-English" errors could be of great benefit to these
learners.
2. Error Analysis
Although there has been (and still is) a great deal of research in the area of language learning the questions of how
people actually "learn" their first language and how they master other languages have remained unanswered.
However, many hypotheses and theories have been put forward to try and account for these processes. One
important research approach has been error analysis, where the errors made by second language learners have been
examined, and used to formulate hypotheses about the learning process. This approach has been followed in
developing the prototype system.
It is sometimes assumed that what can be expressed in one language can automatically be expressed in the other one.
However this is not always true since :
"each language articulates or organises the world differently.
Languages do not simply name existing categories, they articulate
their own." (Littlewood, l984).
This point was much emphasised by the ’Contrastive Analysis Hypothesis’ which was one of the major ideas on error
detection and analysis in the l960’s. It involved the study of how prior acquisition of one or more languages affected
the process of acquiring other languages. (Singleton, l98l) Its aim was to predict what were the most error prone
areas in a second language for learners of a given first language background, and it reasoned that if these areas could
be identified and given particular attention in the teaching of the second language in question, then the acquisition of
that language could be eased. According to the contrastive analysis hypothesis these errors were predictable on the
basis of a comparison of the target language and the mother-tongue of the learner. For example , Nemser stated that
it could
"predict where learning difficulties or facilitation will occur on the basis of a comparison of
descriptions of the learners language (base or source language) and the language he is
attempting to learn (target or reference language)"
(Nemser, l975)
While Eric Kadler felt that
"non-native speakers do make syntactic mistakes as frequently and stubbornly as they make
semantic and morphological mistakes, because they tend to transfer to the foreign language their
native syntactical system as well as their morphological habits and semantic values" .
(Kadler, l970).
So if the learner knows that in his language rule x exists then it is quite possible that he will try to apply this rule in
the target language. On the other hand, cross linguistic influence can be very helpful to someone learning a second
language, in fact as Corder suggests, the first language provides
"a rather rich and specific set of hypotheses". (Corder, l970)
However the contrastive analysis suggests that where the languages differ, errors can result. One needs to be careful
not to overstate the case but the following balanced statement of the claims of Contrastive Analysis was given by
James
"... Contrastive Analysis has never claimed that Ll interference is the sole source of error. As Lado
put it: ’these differences are the chief source of difficulty in learning a second language,’ and, ’the
most important factor determining ease and difficulty in learning the patterns of a foreign
language is their similarity to or difference from the patterns of the native language’. (Lado, l964)
"chief source" and "most important" imply that Ll interference is the not conceived to
be the only source".
(James, l97l)
It seems reasonable to proceed on the assumption that cross linguistic influence plays a role in the errors discovered
in a second language learners target language, although it is by no means the sole cause of errors.
3. System Objectives
A Prototype Syntax Checker for German Learners of English
Page 2
Mulligan & Ryan
ICALL’91
As was stated in the introduction, the main objective of this work was to design and implement a system that would
detect syntax errors in English text written by native German speakers. The design involved both practical and
theoretical work. The practical work took the form of data collection from a number of native German speakers.
From this data "typical" German errors and some universal errors were to be selected. These errors were then
analysed to identify their underlying cause. Once this had been done a system was designed and implemented which
could both parse (a range of) correct sentences of English and also process, and report on, a set of possible incorrect
sentences.
SInce the objective of this work was to demonstrate the feasibility of the approach there was no question of
producing a parser forthe entire English language. Instead a restricted set of English sentences are only allowed as
input to the system. The restriction placed on the user is that he/she can only input declarative sentences. Of course
interrogative and imperative sentence types could be added relatively easily be incorporating these into the grammar.
Given the current limitations however it is quite possible, for erroneous sentences to be accepted or for the system to
respond simply that it cannot handle the input. The range of errors possible in the set of declarative sentences of
English is vast and certainly impossible to enumerate. For this reason a system such as this can only hope to handle a
limited set of errors, in this case the German-specific errors. So unless the error input by the user is a member of the
error list defined, a message will be output saying that the sentence is acceptable or that it is not part of the grammar.
4. Data Collection
This section details the conduct and results of data collection. The English language exercise books of a number of
German exchange students were studied for recurring error patterns. In general many instances of apparent cross
linguistic influence were identified, however there were also many instances of apparent developmental (i.e.
language learning) errors. In some cases the error type could have been attributed either to the developmental
process or cross linguistic influence.
Errors occurred frequently in the choice of word order. German word order can be quite different from the English
and it was apparent that some German word order was being used in English sentences. For example the fact that in
German some conjunctions send the word to the end of the clause or sentence was a source of many problems. One
sentence written was:
’while this between Bathsheba and Boldwood happens, ...’
The word 'while' is 'wæhrend' in German and is one of the conjunctions that sends a verb to the end of the clause or
sentence, so the German version is:'wæhrend dieses zwichen Bathsheba und Boldwood passiert, ...'
this was apparently translated literally by the learner into English. Another word order norm in German is that the
verb should come second in the sentence. Again there appears to be some cross linguistic influence on the word
order in English. For example:'secondly exist some cultural societies like the traditional-music society'
and in German:'Zweitens existieren einige ... '
A similar error type that seemed to be cross-linguistic was the misplacing of the negative element ’not’, as in
'I enjoy not the food very much'
which is probably based on
'Ich geniesse das Essen nicht sehr'
There were also "false friends", where a German word was translated to a similar but noticably different English
word, such as translating 'als' to 'as' ( in comparisons) or 'bekommen' to 'become' as well as the misspelling of
English words which have a slightly different English spelling, notably writing 'befor' instead of 'before'.
Disagreements between determiners and nouns or between nouns and verbs were also commonplace.
Based on the errors observed in the data a system was designed and implemented to deal with the following set of
errors:l. placing the verb at the end of the clause or sentence if the adverb is one of those in German that send the verb
to the end of the sentence or clause, i.e. misplaced verb.
2. placing the negative element (not) directly after the finite, instead of adding an auxiliary and placing the
negative element after the auxiliary.
3. determiner - noun disagreement
4. noun - verb disagreement.
5. the phrase "more... as ..." "being used instead of "more ... than..".
6. specific words being misspelt.
A Prototype Syntax Checker for German Learners of English
Page 3
Mulligan & Ryan
ICALL’91
These errors do not purport to be a comprehensive set but were chosen as a good cross-section of the errors
discovered. There are many other errors that could be handled, however with the time and resources available the
number of errors being dealt with had to be restricted.
5. Implementation
This system was implemented in Scheme on a P.C. Scheme (a dialect of LISP) is well suited to natural language,
processing. It provides excellent support for recursive programming which is central to natural language parsing
techniques. The system was designed in five main sections: the grammar, the lexicon, the parser, the backtracking
algorithm and the error handling devices and these were related as shown in Figure 1
Lexicon
Input
Preprocessor
Grammar
Parser
and
Backtracker
Output
Error Analysis
Spelling Errors
Figure 1. Overall System Structure
A small lexicon of computer related terminology was constructed and a rule-based parser was implemented. It should
be noted that the grammar contained both the rules needed to parse the correct English sentences and an additional
set of ’error’ rules needed to detect the German-specific errors.
6. Error Detection
The system detects errors in three separate passes and keeps a record of all errors detected for subsequent output.
The three passes are described below.
The first process is a ’preprocessor’ and it is applied before the actual parsing of the sentence input by the user. It’s
function is to search for any words that are typically misspelt by German learners of English, for example "befor"
and "allthough". The function takes the sentence input by the user and searches through it word by word, looking for
any incorrect spellings. If any errors are encountered, the incorrect version of the word and the correct version are
stored, along with an error message in the global list of error messages. The system then outputs the error message
to the user, indicating which word (or words) was wrong and what the correct version of the word is.
For example:User Input: ’the user input the program befor he should have’.
Output to the User: ’The following word has been spelled incorrectly,
please re-enter the sentence, using the correct word:
Befor
Before.
At present the user must re-enter the sentence with the corrected spelling before he can proceed. When this is
completed, the parsing of the sentence can begin. While it is true that these errors could also have been detected
later in the process, this would have meant including the correct and the incorrect words in the lexicon, which
seemed wasteful.
The remainder of the error analysis is carried out after the sentence has been parsed successfully and depends on
having a complete set of all the rules that were used in the parse process. This set is called ’rulesused’.
A Prototype Syntax Checker for German Learners of English
Page 4
Mulligan & Ryan
ICALL’91
The second process of error detection and analysis involves searching for error rules in ’rulesused’. As mentioned
previously the grammar implemented includes a specific set of incorrect rules for English. If any of these rules were
used during parsing then this implies that an erroneous sentence was input by the user. So if ’rulesused’ contains any
of the error rules a message, related to the specific error rule used, must be added to the list of errors to date. At
present the error rules included in the grammar are the rules that reflect the negative element (not) being in the
incorrect position in the clause or sentence and the rule where the auxiliary has been omitted altogether. Many
other errors could be defined and detected in a similar way,.
The third process of error detection is to search, specifically for the following errors:l.
2.
3.
4.
Determiner-Noun disagreement.
Noun-Verb disagreement.
The verb placed incorrectly at the end of a clause or sentence because of an incorrect assumption by the
learner that certain bindings send the verb to the end of the clause or sentence. (Misplaced verb)
The incorrect use of the phrase: "more ... as ...".
7. Example Run
This section describes a typical session with the current system. The examples shown cover most of the error types
currently detected. A few examples are also given of errors that the system cannot currently detect, both cases where
the system informs the user that the input is correct, when in fact it contains an error, and case where the system
rejects a correct sentence because there is no relevant rule in the current grammar. Note : In the following example
user input is shown in courier bold and system response is shown in courier plain.
[1] (startup ’ ())
Please type in your input putting a space between each word and a ’q’,
followed by a carriage return to finish
The user did not input data , although the program data required q
Your input was incorrect, you have made l error
"The verb is incorrectly placed at the end of the sentence"
REQUIRED
Would you like to input another sentence?
y
Please type in your input putting a space between each word and a ’q’,
followed by a carriage return to finish
The user did input not his data q
Your input was incorrect, you have made
l error
"the negative element (not) is in the incorrect position.
It should occur directly after the auxiliary"
Would you like to input another sentence?
y
Please type in your input putting a space between each word and a ’q’,
followed by a carriage return to finish
These program was more efficient as those program q
Your input was incorrect, you have made 3 errors
"The determiner doesn’t agree with the noun, the determiner here must have a
plural noun following it’
THESE
PROGRAM
"the ’more as’ construct is not acceptable in English instead use the
construct ’more than’ "
"the determiner doesn’t agree with the noun, the determiner here must have a
plural noun following it"
A Prototype Syntax Checker for German Learners of English
Page 5
Mulligan & Ryan
ICALL’91
THOSE
PROGRAM
Would you like to input another sentence?
y
Please type in your input putting a space between each word and a ’q’,
followed by a carriage return to finish
The techniques are more useful as the software q
Your input was incorrect, you have made l error
"the ’more as’ construct is not acceptable in English instead use the
construct ’more than’ "
Would you like to input another sentence?
y
Please type in your input putting a space between each word and a ’q’,
followed by a carriage return to finish
Many problems have had been solved q
Your input is correct.
Would you like to input another sentence?
y
Please type in your input putting a space between each word and a ’q’,
followed by a carriage return to finish
The personal computer was efficient
This sentence is not acceptable by the system’s grammar,
please check it with someone.
Would you like to input another sentence?
n
okay
8. Evaluation and Future Work
The current prototype system is inadequate in many ways and will require considerable extension and improvement
before it can be of use to students in writing correct English. Both the grammar and the lexicon are severely
restricted and while neither could ever be "complete" in any absolute sense it is feasible to extend the lexicon to
cover the majority of terms likely to be encountered. In fact the lexicon could be restructured so that it consists of a
"general" lexicon, which contains the basic vocabulary needed irrespective of the area in question along with a set of
’subject-specific’ lexicons relating to specific areas. A user could then choose as many sub-lexicons as necessary to
handle the sentence types being input.
A feature might also be added so that the lexicon could "learn" new words. This would allow the user to add new
words and relevant information about these words as they are encountered.Such a lexicon design,
analagous to the specialist directories in spell-checkers, would allow users in many areas to use the system. The
incompleteness of the grammar for English is more difficult to remedy. No complete grammar is really feasible and
even if it was, it is likely that the user would want the freedom to create ungrammetric sentences for purposes of
illustration, novelty or dramatic effect. Another, easier, development would be to survey more source data and
thereby increase the coverage of error types.
Finally the human-computer interface must be greatly improved making use of windows, highlighting and possibly
sound so that the users input and the error messages are clearly related in the display, possibly producing a suggested
correct version of the sentence for the user’s approval. It is anticipated that these improvements will require a
complete re-implementation of the system. One radical change which might also be considered is to drop the parsing
of correct English sentences and only look for the defined errors. This might be done on a monitoring basis so that
the syntax-checker ran in the background and only drew the user’s attention when possible error was detected.
Aside from improving the current system it could be used to support a number of types of studies, both longitudinal
and cross-sectional. Longitudinal study, using the same group of learners over time, might identify which errors are
prone to occur a certain stages of the language learning process. Such knowledge could then be incorporated into the
system so that the student’s current level e.g. total beginner, advanced student etc. and the relevant error list could
guide the serch. Cross-sectional study, collecting all the interactions with users over a fixed time period, would
provide statistical information on the variety and frequency of errors.
A Prototype Syntax Checker for German Learners of English
Page 6
Mulligan & Ryan
ICALL’91
Overall it is felt that the limited study carried out so far has shown the feasibility and desirability of this approach to
computer-aided language learning.
Acknowledgement
The authors are indebted to David Singleton of Trinity College Dublin for his advice and information on error
analysis in general and the contrastive analysis hypothesis in particular.
References
Corder 74
Corder S.P., "The Significance of Learner’s Errors", in Error Analysis : Perspectives on
Second Language Acquisition, Richards J (Ed.)1974
James 71
James C., article in Singleton 1981
Kadler 70
Kadler E.H., Linguistics and Teaching Foreign Languages 1970
Lado 57
Lado R., article in Singleton 1981
Littlewood 74 Littlewood W., Foreign and Second Language Learning, Cambridge University Press, 1974
Nemser 75
Nemser , article in SIngleton 1981
Singleton 81
Singleton D.M., "Language Transfer : A review of some recent research", Centre for
Language and Communication Studies, Trinity College Dublin, 1981
A Prototype Syntax Checker for German Learners of English
Page 7