Translation is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream; in the past years research in this field has acquired a growing interest, making some forms of Machine Translation (MT) a reality. Among the several types of approaches in MT, one of the most promising paradigms is MAHT and, in particular, example-Based Machine Translation (EBMT). An EBMT system translates by analogy, using past translations to translate other, similar sourcelanguage sentences into the target language. The basic premise is that, if a previously translated sentence occurs again, the same translation is likely to be correct. In this paper, we propose a solution based on a purely syntactic approach for searching similar sentences and parts of them in an EBMT system; the underlying similarity measure is based on the similarity between sequence of terms such that the sentences most close to a given one are those who maintain most of the original form and contents. The system efficiently retrieves and ranks the most similar sentences available and, when no useful suggestion exists, it proceeds with the retrieval of similar parts. We opted for a design that would require minimal changes to existing databases and whose similarity measure and search algorithms are completely independent from the involved languages. This work has been developed as a joint work with LOGOS S.p.A., a worldwide leader in multilingual document translation.
Searching Similar (Sub)Sentences for Example-Based Machine Translation / Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo. - STAMPA. - (2002), pp. 208-221. (Intervento presentato al convegno Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD 2002) tenutosi a Portoferraio, Italy nel June 2002).
Searching Similar (Sub)Sentences for Example-Based Machine Translation
MANDREOLI, Federica;MARTOGLIA, Riccardo;TIBERIO, Paolo
2002
Abstract
Translation is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream; in the past years research in this field has acquired a growing interest, making some forms of Machine Translation (MT) a reality. Among the several types of approaches in MT, one of the most promising paradigms is MAHT and, in particular, example-Based Machine Translation (EBMT). An EBMT system translates by analogy, using past translations to translate other, similar sourcelanguage sentences into the target language. The basic premise is that, if a previously translated sentence occurs again, the same translation is likely to be correct. In this paper, we propose a solution based on a purely syntactic approach for searching similar sentences and parts of them in an EBMT system; the underlying similarity measure is based on the similarity between sequence of terms such that the sentences most close to a given one are those who maintain most of the original form and contents. The system efficiently retrieves and ranks the most similar sentences available and, when no useful suggestion exists, it proceeds with the retrieval of similar parts. We opted for a design that would require minimal changes to existing databases and whose similarity measure and search algorithms are completely independent from the involved languages. This work has been developed as a joint work with LOGOS S.p.A., a worldwide leader in multilingual document translation.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris