Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology Iterative Translation Disambiguation for Cross-Language Information Retrieval Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : Christof Monz and Bonnie J. Dorr 2005.SIGIR.520-527 1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Approach Experiment Result Introduction Experiment Conclusions 2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Many words or phrases in one language can be translated into another language in a number of way, so translation ambiguity is very common ,that impacting the effectiveness of information retrieval. Elfmeter (Soccer) Penalty (English) Strafe (punishment) 3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective Finding a proper distribution of translation probabilities that can solve the translation ambiguity problem. 4 Intelligent Database Systems Lab europa europe N.Y.U.S.T. I. M. Approach gewerbe geschaeft Find a proper of translation probabilities. Computing Term Weight ─ Initialization Step ─ Iteration Step ─ Normalization Step handel union gewerkschaft union trade ex : wT1 (ti ,1 | si ) 0.0833 2 *1 0.2 * 2 0.2 * 2 ─ All term weights in a vector ─ Iteration Stop 5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Approach Measuring association strength ─ Pointwise mutual information ─ Dice coefficient ─ Log Likelihood ratio 6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiment Result baseline Improve Differences 7 Individual queries (topic) Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Two techniques for cross-language retrieval ─ ─ Translate collection of document into target language and apply monolingual retrieval Translate the query into target language and apply translated query retrieval Three approach may be used produce the translations ─ ─ ─ Machine translation system Dictionary Parallel corpus to estimate the probabilities 8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction One language translation into another language in a number ways. ─ Penalty (English) => Elfmeter (soccer) or Strafe (punishment) 9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction A approach can solve the problem of word selection is to use co-occurrences between term. Problem (a larger number of terms) ─ Data-sparseness Use very large corpora for counting co-occruences frequencies Use internet search engines Smoothing 10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiment Test Data ─ CLEF 2003 English to German bilingual data ─ Choice 56 topic (title, description, narrative) Morphological Normalization ─ Source-language word (topic) normalized to match in bilingual dictionary ─ De-compounding:5-grams ─ Assign weights to 5-gram substrings 11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiment Retrieval Model ─ Lnu.Itc weighting scheme ─ Weighted document similarity Statistical Significance ─ Bootstrap method Bootstrap sample One-tailed significance testing (compare two retrieval method) 12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiment Found some problem in experiment ─ Individual average precision of Log Likelihood ratio decreases for a number of query. Unknown word The original word from the source language is include in the target language query. Example Women’s Conference Beijing Result Not find : Woman Women (專有名詞) normalized Women Women Assign weighted =1 1.Woman control document simliarity 2.Most top-ranked documents contain Women as the only matching term. 13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Our approach improve retrieval effectiveness compare to baseline using bilingual dictionary lookup. Experimental result show that Log Likelihood Ratio has the strong positive impact. 14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. My opinion Advantage: Disadvantage: It only requires a bilingual dictionary and a monolingual corpus in the target language. Unknown word Apply 15 Intelligent Database Systems Lab