Beyond Skylines and Top-k Queries: Representative Databases and e-Commerce Product Search

Tutorial at CIKM-2013
Beyond Skylines and Top-k Queries: Representative Databases and e-Commerce Product Search
Atish Das Sarma, Ashwin Lall, Nish Parikh, Neel Sundaresan

Skyline queries have been a topic of intense study in the database area for over a decade now. Similarly, the top-k retrieval query has been heavily investigated by both the database as well as the web research communities. This tutorial will delve into the background of these two areas, and specifically focus on the recent challenges with respect to returning a small set of results to users, as well as requiring minimal intervention or input from them.

These are two main concerns with skylines and top-k respectively, and therefore have drawn a great deal of attention in the recent years with several interesting ideas being proposed in the research community. This tutorial will cover the current approaches to representative database selection. We will focus on both the theoretical models as well as the practical aspects from an industry standpoint.

The topics of covered in this tutorial will include identifying representative subsets of the skyline set, interaction based approaches, e-commerce product search, and leveraging aggregate user preference statistics.

Another publication from the same category: Machine Learning and Data Science

IEEE Computing Conference 2018, London, UK

Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

The kernel trick concept, formulated as an inner product in a feature space, facilitates powerful extensions to many well-known algorithms. While the kernel matrix involves inner products in the feature space, the sample covariance matrix of the data requires outer products. Therefore, their spectral properties are tightly connected. This allows us to examine the kernel matrix through the sample covariance matrix in the feature space and vice versa. The use of kernels often involves a large number of features, compared to the number of observations. In this scenario, the sample covariance matrix is not well-conditioned nor is it necessarily invertible, mandating a solution to the problem of estimating high-dimensional covariance matrices under small sample size conditions. We tackle this problem through the use of a shrinkage estimator that offers a compromise between the sample covariance matrix and a well-conditioned matrix (also known as the "target") with the aim of minimizing the mean-squared error (MSE). We propose a distribution-free kernel matrix regularization approach that is tuned directly from the kernel matrix, avoiding the need to address the feature space explicitly. Numerical simulations demonstrate that the proposed regularization is effective in classification tasks.