Regret-Minimizing Representative Databases

PVLDB 2010
Regret-Minimizing Representative Databases
Atish Das Sarma, Danupon Nanongkai, Ashwin Lall, Richard J.Lipton, Jim Xu, Atish Das Sarma, Danupon Nanongkai, Ashwin Lall, Richard J.Lipton, Jim Xu

We propose the k-representative regret minimization query (k-regret) as an operation to support multi-criteria decision making. Like top-k, the k-regret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions.

Like skyline, it filters out a set of interesting points from a potentially large database based on the users' criteria; however, it never overwhelms the users by outputting too many tuples.

In particular, for any number k and any class of utility functions, the k-regret query outputs k tuples from the database and tries to minimize the maximum regret ratio. This captures how disappointed a user could be had she seen k representative tuples instead of the whole database. We focus on the class of linear utility functions, which is widely applicable.

The first challenge of this approach is that it is not clear if the maximum regret ratio would be small, or even bounded. We answer this question affirmatively. Theoretically, we prove that the maximum regret ratio can be bounded and this bound is independent of the database size.

Moreover, our extensive experiments on real and synthetic datasets suggest that in practice the maximum regret ratio is reasonably small. Additionally, algorithms developed in this paper are practical as they run in linear time in the size of the database and the experiments show that their running time is small when they run on top of the skyline operation which means that these algorithm could be integrated into current database systems.

Another publication from the same category: Machine Learning and Data Science

IEEE Computing Conference 2018, London, UK

Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

The kernel trick concept, formulated as an inner product in a feature space, facilitates powerful extensions to many well-known algorithms. While the kernel matrix involves inner products in the feature space, the sample covariance matrix of the data requires outer products. Therefore, their spectral properties are tightly connected. This allows us to examine the kernel matrix through the sample covariance matrix in the feature space and vice versa. The use of kernels often involves a large number of features, compared to the number of observations. In this scenario, the sample covariance matrix is not well-conditioned nor is it necessarily invertible, mandating a solution to the problem of estimating high-dimensional covariance matrices under small sample size conditions. We tackle this problem through the use of a shrinkage estimator that offers a compromise between the sample covariance matrix and a well-conditioned matrix (also known as the "target") with the aim of minimizing the mean-squared error (MSE). We propose a distribution-free kernel matrix regularization approach that is tuned directly from the kernel matrix, avoiding the need to address the feature space explicitly. Numerical simulations demonstrate that the proposed regularization is effective in classification tasks.