The Importance of Features for Statistical Anomaly Detection

HotCloud '15, 7th USENIX Workshop on Hot Topics in Cloud Computing, Santa Clara July 2015
The Importance of Features for Statistical Anomaly Detection
David Goldberg, Yinan Shan
Categories
eBay Authors
Abstract

The theme of this paper is that anomaly detection splits into two parts: developing the right features, and then feeding these features into a statistical system that detects anomalies in the features. Most literature on anomaly detection focuses on the second part. Our goal is to illustrate the importance of the first part. We do this with two real-life examples of anomaly detectors in use at eBay.

Another publication from the same author:

IEEE Big Data, Boston MA, Dec 2017

What is Skipped: Finding Desirable Items in E-Commerce Search by Discovering the Worst Title Tokens

Ishita Khan, Prathyusha Senthil Kumar, Daniel Miranda, David Goldberg

Given an ecommerce query, how well the titles of items for sale match the user intent is an important signal for ranking the items. A well-known technique for computing this signal is to use a standard machine-learned model that uses words as features, targets user clicks and predicts a score to rank the titles. In this paper, we introduce an alternate modeling technique that applies to queries that are frequent enough to have historical click data. For each such query we build a parameterized model of user behavior that learns what makes users skip a title. The parameters are different for each query. Specifically, our model predicts how desirable an item’s title is to the user query by focusing on the worst tokens in the title. The model is learned offline using maximum likelihood based on user behavioral data, significantly improving query processing cost. The model’s output score is used as a feature in a machine learned ranker for e-commerce search at eBay. Besides titles, the model design can easily incorporate any attribute of an item including structured content. In this scope, we present our new title desirability model built for nearly 8M queries recently deployed into the eBay search ecosystem and demonstrate its significant performance improvement over a baseline click-based Na¨ıve Bayes model through different evaluation approaches including A/B testing and human judgment. The reported performance is based on eBay's commercial search engine serving millions of queries each day.

Keywords
Categories

Another publication from the same category: Other

Information Systems 60: 34-49 (2016)

Aggregated 2D range queries on clustered points.

Nieves R. Brisaboa, Guillermo de Bernardo, Roberto Konow, Gonzalo Navarro, Diego Seco

Efficient processing of aggregated range queries on two-dimensional grids is a common requirement in information retrieval and data mining systems, for example in Geographic Information Systems and OLAP cubes. We introduce a technique to represent grids supporting aggregated range queries that requires little space when the data points in the grid are clustered, which is common in practice. We show how this general technique can be used to support two important types of aggregated queries, which are ranked range queries and counting range queries. Our experimental evaluation shows that this technique can speed up aggregated queries up to more than an order of magnitude, with a small space overhead.

Keywords
Categories