* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Assignment 1: Manual Direct Translation
Kannada grammar wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Agglutination wikipedia , lookup
Esperanto grammar wikipedia , lookup
Polish grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Swedish grammar wikipedia , lookup
Junction Grammar wikipedia , lookup
Word-sense disambiguation wikipedia , lookup
Pipil grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Assignment 1: Manual Direct Translation Rebecca Jonson GSLT course: Machine Translation 2004 In this assignment I have tried to implement a Prolog program that translates the following Swedish sentence to English: Sv. Ytterst handlar kampen för sysselsättning om att hålla samman Sverige. The translation given was the following: En. Ultimately, the fight for full employment concerns the cohesion of Swedish society. I started with an implementation of a very simplistic approach and then tried to improve the program with more advanced direct translation methods. The two attempts are described below followed by a short discussion of the result and a comparison with Systran. The implementation is in Prolog and the code can be found on my webpage (the system does not work with the Swedish letters å ä ö yet so you need to write au, ae, oe if you run it). First Attempt (simplistic approach): Algorithm: First, try to translate bigger parts of the sentence (only implemented two-word look up) by looking up phrases in the lexicon (also words that do not follow each other). Then back off to word to word translation of the result by using the bilingual dictionary. Unknown words are copied. Capital letters in source language gives capital letters in target language (not implemented). Punctuation is copied (not implemented). Part of the algorithm was implemented as follows: run_trans(Target):-read_string(Str), string2wordlist(Str, SourceList), trans2(SourceList,Tr1), trans1(Tr1,Target). The predicate run_trans takes a string, produces a wordlist and sends this list for translation to the trans2 predicate that makes two-word translations. The result is sent to the trans1 predicate that makes word-to-word translations and outputs the final translation result. %word-to-word translation trans1([],[]). trans1([FirstWord|Rest], Target):%two-word translation trans2([],[]). lookup([FirstWord],Targ1), trans1(Rest, Targ2), append(Targ1,Targ2,Target). trans2([FirstWord|Rest],Target):( member(Second, Rest), lookupSvEng([FirstWord, Second],Targ1), delete(Rest,Second,RestNew) ; Targ1 = [FirstWord], RestNew = Rest ), trans2(RestNew, Targ2), append(Targ1,Targ2,Target). The bilingual dictionary looks as follows (based on Norstedts svensk-engelska, 1992): lookupSvEng([ytterst], [ultimately]). lookupSvEng([handla, om],[be, about]). lookupSvEng([kamp],[fight]). lookupSvEng([för],[for]). lookupSvEng([sysselsättning],[employment]). lookupSvEng([hålla, samman],[keep, together]). lookupSvEng([att],[to]). lookupSvEng([sverige],['Sweden']). Result The first attempt with the program gave the following result. | ?- run_trans(T). |: Ytterst handlar kampen för sysselsättning om att hålla samman Sverige. T = [ultimately,handlar,kampen,for,employment,om,to,keep,together,'Sweden'] ? yes Compared to: En. Ultimately, the fight for full employment concerns the cohesion of Swedish society. The result of this model is quite poor and it is not very intelligible as part of the sentence has not been translated and definitely therefore far from correct English. As seen, no morphological analysis was made so inflected words were not translated. This is probably the method’s biggest drawback. I will try to improve that in the second attempt. Second attempt: In the second attempt I took the program from the first attempt and added some steps from advanced direct translation strategy in the following order trying to improve the translation: Step 1 Source text dictionary look-up + morphological analysis I added a source text dictionary with head words with their part of speech category. This was used in the morphological analysis that I added. lookupSv(ytterst, adv). lookupSv(handla,v). lookupSv(kamp, n). lookupSv(foer,prep). lookupSv(sysselsaettning, n). lookupSv(haulla,v). lookupSv(samman,prep). lookupSv(om,prep). lookupSv(att, infm). lookupSv(sverige, n). The morphological analysis worked in the following way for nouns: If the word ends in ‘–en’ delete the suffix and look if the combination left is a word with category noun. If so, add the feature DEF (for definite) and the shortened word to the word list. The order of the feature and the word is based on the target language structure (e.g. ‘the fight’ and not ‘fight the’). Verbs were handed like this: If the word ends in –r, delete the suffix and look up if the rest is a Swedish verb. If so, add the shortened word and the feature PRES to the wordlist. The morphological analysis was implemented with the following predicates: %morph analysis morfanalys([],[]). morfanalys([W|WL],MorfList):- lookupSv(W,_), morfanalys(WL, Morf), append([W],Morf,MorfList). morfanalys([W|WL],MorfList):- getMorf(W,M), morfanalys(WL,Morf), append(M,Morf, MorfList). %%%%looking up if Word ends with -en and is a Swedish noun getMorf(Word,Lookup):atom_chars(Word,Chars), suffix([101,110],Chars, MorfList), atom_chars(Morf,MorfList), lookupSv(Morf,n), Lookup = [def, Morf]. %%%%looking up if Word ends with -r and is a Swedish verb getMorf(Word, Lookup):atom_chars(Word,Chars), suffix([114],Chars,MorfList), atom_chars(Morf,MorfList), lookupSv(Morf,v), Lookup = [Morf, pres]. suffix(Xs,Xs,[]). suffix(Xs,[Y|Ys],Morf):-suffix(Xs,Ys, Prefix), append([Y],Prefix,Morf). Result The result of adding this step gave the following: | ?- run_trans(T). |: Ytterst handlar kampen för sysselsättning om att hålla samman Sverige. T = [ultimately, be, about, pres, def, fight, for, employment, to, keep, together, ‘Sweden’] ? As seen, the words ‘handlar’ and ‘kampen’ have now been found and given a translation, although we still need to make some processing of the target text to get their whole translation. Step 2 Synthesis and morphological processing of target text The synthesis and morphological processing of the target step has been implemented with the predicate targetsynt. The predicate looks for features in the word list such as def and pres left from the morphological analysis and processes the target text. One of the rules substitutes all def features for the definite article the. Another one checks if there is a verb or verb expression followed by a tense indicator and substitutes the verb in the list for the verb in the correct tense form. Apart from this the target output has been fixed and is presented as a string (and not as a list) by changing the main predicate run_trans. %Looks for definite articles and synthesises 'the' targetsynt(WL,Target):- member(def,WL), substitute(def,WL,the,TL), targetsynt(TL,Target). %Checks that there are no plural forms targetsynt(WL,WL):-member(pres,WL), member(plur,WL). %Checks tense and assumes singular form and inflects the verb to found tense targetsynt(WL,Target):sublist([X,pres],WL), lookupEng(X,v), tenseEng(X,pres,sng,Y), delete(WL,pres,WL1), substitute(X,WL1,Y,Targ1), targetsynt(Targ1,Target). targetsynt(WL,Target):- sublist([X,_,pres],WL), lookupEng(X,v), tenseEng(X,pres,sng,Y), delete(WL,pres,WL1), substitute(X,WL1,Y,Targ1), targetsynt(Targ1,Target). %Deletes singular form indicators left from morfanalysis targetsynt(WL,Target):- member(sng,WL), delete(WL,sng,Target). targetsynt(WL,WL). lookupEng(ultimately,adv). lookupEng(be,v). lookupEng(about,prep). lookupEng(fight,n). lookupEng(for,prep). lookupEng(employment,n). lookupEng(keep,v). lookupEng(together,adv). lookupEng(to,infm). lookupEng('Sweden',n). lookupEng(the, def). tenseEng(be,pres,sng,is). tenseEng(be,pres,plur,are). | ?- run_trans. |: Ytterst handlar kampen för sysselsättning om att hålla samman Sverige Target:ultimately is about the fight for employment to keep together Sweden Step 3 rearrangement of words and phrases in target text The rearrangement of words and phrases in target text is hard without parsing the whole sentence. A simplified method that could be used to solve the problem in the example sentence would be to identify np and vp phrases and then check that the word order is correct by checking that it is not the case that the first vp comes earlier than the first np phrase. If they do, we would need to change the order of them. This method has not been implemented but would give the following result on the target text: Target:ultimately the fight for employment is about to keep together Sweden Apart from this some adjustments on the target words depending on their context have been done. I have added a rule that processes the target text from structures such as ‘prep to verb’ (e.g. about to keep) to the form ‘prep v+ing’ (e.g. about keeping). The predicate is called ingform. ingform(WL,TL):- pos(WL,POS), sublist([(_,prep),(to,infm),(Y,v)],POS), delete(WL,to,NWL), ing(Y,Ying), substitute(Y,NWL,Ying,TL). ingform(WL,WL). pos([],[]). pos([W|WL],[(W,C)|POS]):-lookupEng(W,C), pos(WL,POS). pos([W|WL],[(W,C)|POS]):-lookupSv(W,C), pos(WL,POS). pos([W|WL],[(W,unk)|POS]):pos(WL,POS). ing(Y,Ying):- lookupEng(Y,v), atom_chars(Y,L), append(L,[105,110,103],TL), atom_chars(Ying,TL). This rule makes the following translation possible: | ?- run_trans. |: Kampen för att hålla samman Sverige. Target:the fight for keeping together Sweden Step 4 Editing This step includes some editing of the target text, some that I have implemented and some that have just been thought of. One rule that inserts a comma after an adverb starting an English sentence, has been added as follows: insertcomma(T,TC):-lookupEng(T,adv), atom_chars(T,L), append(L,[44],TL), atom_chars(TC,TL). insertcomma(T,T). This gives the following result: | ?- run_trans. |: Ytterst handlar kampen för sysselsättning om att hålla samman Sverige Target:ultimately, is about the fight for employment to keep together Sweden Another rule should also be added that puts capital letter on the first word in the sentence. Finally, the punctuation is added. These rules have not been implemented. Evaluation/Discussion of Final Result: The final result from the implementation is: Target:ultimately, is about the fight for employment keeping together Sweden and with the rules described that I have not implemented the result is the following: Target: Ultimately, the fight for employment is about keeping together Sweden. The result is not too bad as it is quite intelligible and at least the last example (not the implemented one) seems to be quite accurate and looks like English. It is far away though, from the full-fledged style in the example translation. The quality, I would say is not too bad for being a very simple MT-implementation, but especially the last part of the sentence sounds a bit weird in English. The manual example translation captures the meaning of the Swedish sentence in a much better way. On the other hand, the reason for getting such a “good” result is thanks to some MT-faking as the MT-processing has been adapted to this specific case. Knowing this, the result is not that good. The dictionary has for example been adapted to the translation task and I have thereby avoided homographs and used a chosen vocabulary. The system does accept other phrases though and I have tried to make things not too specific. There are some bugs in the program that does not interfere with the translation process as it is now, but that would give problems if the system was extended or used for other examples. Due to lack of time and as the system works for the task it was built for I have chosen to leave the program as it is. Comparison with Systran translation: The Systran translation gives: Outermost acts the struggle for employment about holding together Sweden The program implemented for this assignment gives the following translation: Target: ultimately, is about the fight for employment keeping together Sweden The result of the assignment including manual steps described is: Target: Ultimately, the fight for employment is about keeping together Sweden. Following the Intelligibility Scale proposed by Arnold et al. (1.) the last sentence is perfectly intelligible and grammatical and I would therefore score it quite high. It looks and sounds like good English although the word choice is not absolutely satisfactory. The output phrase of my program is worse as it is not at first intelligible due to the word order error but it would be easy to fix with some post-editing. Apart from the word order error the word choices are ok. The Systran translation, I would say is even harder to understand and a post-editing of this translation seems harder to me. The word choices of the first part of the sentence are the big reason for the unintelligibility and the inaccuracy. It must be said in defence of the Systran system, that my system has avoided most of the problems the Systran system struggles with by being adapted to this specific translation task and by choosing carefully the lexical translation in the bilingual dictionary avoiding homographs. The development of my little system has shown to me that the direct translation method works, although I doubt that it is the best choice. I think you would need some processing of bigger structures to be able to do good translations and solve ambiguity on all its levels. References 1. D.J. Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys and Louisa Sadler, 1994 Machine Translation: an Introductory Guide, Blackwells-NCC, London, 1994, ISBN: 1855542-17x http://www.essex.ac.uk/linguistics/clmt/MTbook/PostScript/