Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
eBay Authors

An important challenge to statistical machine translation (SMT) is the lack of parallel data for many language pairs. One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations. In this paper, we present two language-independent features to improve the quality of phrase-pivot based SMT. The features, source connectivity strength and target connectivity strength reflect the quality of projected alignments between the source and target phrases in the pivot phrase table. We show positive results (0.6 BLEU points) on Persian-Arabic SMT as a case study.


Another publication from the same author: Gregor Leusch

International Conference on Natural Language Generation, Santiago de Compostela, Spain, September 2017

Generating titles for millions of browse pages on an e-Commerce site

We present three approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish. These browse pages are structured search pages in an e-commerce domain. We first present a rule-based approach to generate these browse page titles. In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title. For the two languages English and German, we have access to a large amount of rule-based generated and human-curated titles. For these languages, we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles.

Another publication from the same category: Machine Translation

MT Summit, Nagoya, Japan, September 2017

Harvesting Polysemous Terms from e-commerce Data to Enhance QA

Silvio Picinini

Polysemous words can be difficult to translate and can affect the quality of Machine Translation (MT) output. Once the MT quality is affected, it has a direct impact on post-editing and on human-assisted machine translation. The presence of these terms increases the risk of errors. We think that these important words can be used to improve and to measure quality of translations. We present three methods for finding these words from e-commerce data, based on Named Entity Recognition, Part-of-Speech and Search Queries.