Gregor Leusch

Gregor Leusch
Research Scientist

Gregor joined Aachen's NLP institute as a student worker already as an undergraduate, and later focused on MT evaluation. He received a Master degree on a thesis about automatic MT evaluation measures. Continuing as a research assistant, he noticed that quite a lot of aspects of MT system combination -- where Evgeny did some fundamental work -- shared several aspects with MT evaluation, and so he added this to his research profile. During a research visit at the National Research Council Canada, Ottawa, he investigated novel Machine Learning approaches on MT system combination.

In 2012, Gregor followed Hassan's call and joined Evgeny at SAIC/Leidos. He helped improving the MT system, added new features, improved speed and memory footprint, and build dozens of translation models; general MT models as well as models tailored for certain domains, applications, or customers. But even having left the academic ivory tower, he continued to work as a reviewer for large MT conferences and journals such as ACL, COLING, or TALIP. At eBay, he enjoys a lot to take his hands on huge amounts of real data from real people, with all its noise, misspellings, spam, etc, and help these very same people to be able to talk to all the other people, even when they do not share a common language.


Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

An important challenge to statistical machine translation (SMT) is the lack of parallel data for many language pairs. One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations. In this paper, we present two language-independent features to improve the quality of phrase-pivot based SMT. The features, source connectivity strength and target connectivity strength reflect the quality of projected alignments between the source and target phrases in the pivot phrase table. We show positive results (0.6 BLEU points) on Persian-Arabic SMT as a case study.


Proceedings of the 6th International Joint Conference on Natural Language Processing

Selective Combination of Pivot and Direct Statistical Machine Translation Models

In this paper, we propose a selective combination approach of pivot and direct statistical machine translation (SMT) models to improve translation quality. We work with Persian-Arabic SMT as a case study. We show positive results (from 0.4 to 3.1 BLEU on different direct training corpus sizes) in addition to a large reduction of pivot translation model size.

International Conference on Natural Language Generation, Santiago de Compostela, Spain, September 2017

Generating titles for millions of browse pages on an e-Commerce site

We present three approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish. These browse pages are structured search pages in an e-commerce domain. We first present a rule-based approach to generate these browse page titles. In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title. For the two languages English and German, we have access to a large amount of rule-based generated and human-curated titles. For these languages, we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles.