Drawing Sound Conclusions from Noisy Judgments

WWW '17 Perth Australia April 2017
Drawing Sound Conclusions from Noisy Judgments
David Goldberg, Andrew Trotman, Xiao Wang, Wei Min, Zongru Wan
eBay Authors

The quality of a search engine is typically evaluated using hand-labeled data sets, where the labels indicate the relevance of documents to queries. Often the number of labels needed is too large to be created by the best annotators, and so less accurate labels (e.g. from crowdsourcing) must be used. This introduces errors in the labels, and thus errors in standard precision metrics (such as P@k and DCG); the lower the quality of the judge, the more errorful the labels, consequently the more inaccurate the metric. We introduce equations and algorithms that can adjust the metrics to the values they would have had if there were no annotation errors.

This is especially important when two search engines are compared by comparing their metrics. We give examples where one engine appeared to be statistically significantly better than the other, but the effect disappeared after the metrics were corrected for annotation error. In other words the evidence supporting a statistical difference was illusory, and caused by a failure to account for annotation error.

Another publication from the same author:

IEEE Big Data, Boston MA, Dec 2017

What is Skipped: Finding Desirable Items in E-Commerce Search by Discovering the Worst Title Tokens

Ishita Khan, Prathyusha Senthil Kumar, Daniel Miranda, David Goldberg

Given an ecommerce query, how well the titles of items for sale match the user intent is an important signal for ranking the items. A well-known technique for computing this signal is to use a standard machine-learned model that uses words as features, targets user clicks and predicts a score to rank the titles. In this paper, we introduce an alternate modeling technique that applies to queries that are frequent enough to have historical click data. For each such query we build a parameterized model of user behavior that learns what makes users skip a title. The parameters are different for each query. Specifically, our model predicts how desirable an item’s title is to the user query by focusing on the worst tokens in the title. The model is learned offline using maximum likelihood based on user behavioral data, significantly improving query processing cost. The model’s output score is used as a feature in a machine learned ranker for e-commerce search at eBay. Besides titles, the model design can easily incorporate any attribute of an item including structured content. In this scope, we present our new title desirability model built for nearly 8M queries recently deployed into the eBay search ecosystem and demonstrate its significant performance improvement over a baseline click-based Na¨ıve Bayes model through different evaluation approaches including A/B testing and human judgment. The reported performance is based on eBay's commercial search engine serving millions of queries each day.


Another publication from the same category: Machine Learning and Data Science

IEEE Computing Conference 2018, London, UK

Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

The kernel trick concept, formulated as an inner product in a feature space, facilitates powerful extensions to many well-known algorithms. While the kernel matrix involves inner products in the feature space, the sample covariance matrix of the data requires outer products. Therefore, their spectral properties are tightly connected. This allows us to examine the kernel matrix through the sample covariance matrix in the feature space and vice versa. The use of kernels often involves a large number of features, compared to the number of observations. In this scenario, the sample covariance matrix is not well-conditioned nor is it necessarily invertible, mandating a solution to the problem of estimating high-dimensional covariance matrices under small sample size conditions. We tackle this problem through the use of a shrinkage estimator that offers a compromise between the sample covariance matrix and a well-conditioned matrix (also known as the "target") with the aim of minimizing the mean-squared error (MSE). We propose a distribution-free kernel matrix regularization approach that is tuned directly from the kernel matrix, avoiding the need to address the feature space explicitly. Numerical simulations demonstrate that the proposed regularization is effective in classification tasks.