Bootstrapped Language Identification For Multi-Site Internet Domains

Proceedings of KDD’12, Beijing, China. August 2012
Bootstrapped Language Identification For Multi-Site Internet Domains
Abstract

We present an algorithm for language identification, in particular of short documents, for the case of an Internet domain with sites in multiple countries with differing languages.

The algorithm is significantly faster than standard language identification methods, while providing state-of-the-art identification. We bootstrap the algorithm based on the language identification based on the site alone, a methodology suitable for any supervised language identification algorithm.

We demonstrate the bootstrapping and algorithm on eBay email data and on Twitter status updates data. The algorithm is deployed at eBay as part of the back-office development data repository.

Another publication from the same author: Uwe Mayer

Proceedings of the Sixteenth ACM Conference on Economics and Computation (EC '15). ACM, New York, NY, USA (2015)

Canary in the e-Commerce Coal Mine: Detecting and Predicting Poor Experiences Using Buyer-to-Seller Messages

Dimitriy Masterov, Uwe Mayer, Steve Tadelis

Reputation and feedback systems in online marketplaces are often biased, making it difficult to ascertain the quality of sellers. We use post-transaction, buyer-to-seller message traffic to detect signals of unsatisfactory transactions on eBay. We posit that a message sent after the item was paid for serves as a reliable indicator that the buyer may be unhappy with that purchase, particularly when the message included words associated with a negative experience. The fraction of a seller's message traffic that was negative predicts whether a buyer who transacts with this seller will stop purchasing on eBay, implying that platforms can use these messages as an additional signal of seller quality.

Another publication from the same category: Machine Translation

Copenhagen, Denmark, September 2017

Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search

Leonard Dahlmann, Evgeny Matusov, Pavel Petrushkov, Shahram Khadivi

In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German->English news domain and English->Russian ecommerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline.