On the Embeddability of Random Walk Distances

PVLDB-2013
On the Embeddability of Random Walk Distances
Adelbert Chang, Atish Das Sarma, Haitao Zheng, Ben Y.Zhao
Abstract

Analysis of large graphs is critical to the ongoing growth of search engines and social networks. One class of queries centers around node affinity, often quantified by random-walk distances between node pairs, including hitting time, commute time, and personalized PageRank (PPR).

Despite the potential of these "metrics," they are rarely, if ever, used in practice, largely due to extremely high computational costs. In this paper, we investigate methods to scalably and efficiently compute random-walk distances, by "embedding" graphs and distances into points and distances in geometric coordinate spaces.

We show that while existing graph coordinate systems (GCS) can accurately estimate shortest path distances, they produce significant errors when embedding random-walk distances.

Based on our observations, we propose a new graph embedding system that explicitly accounts for per-node graph properties that affect random walk. Extensive experiments on a range of graphs show that our new approach can accurately estimate both symmetric and asymmetric random-walk distances.

Once a graph is embedded, our system can answer queries between any two nodes in 8 microseconds, orders of magnitude faster than existing methods. Finally, we show that our system produces estimates that can replace ground truth in applications with minimal impact on application output.

Another publication from the same category: Machine Learning and Data Science

IEEE Computing Conference 2018, London, UK

Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

The kernel trick concept, formulated as an inner product in a feature space, facilitates powerful extensions to many well-known algorithms. While the kernel matrix involves inner products in the feature space, the sample covariance matrix of the data requires outer products. Therefore, their spectral properties are tightly connected. This allows us to examine the kernel matrix through the sample covariance matrix in the feature space and vice versa. The use of kernels often involves a large number of features, compared to the number of observations. In this scenario, the sample covariance matrix is not well-conditioned nor is it necessarily invertible, mandating a solution to the problem of estimating high-dimensional covariance matrices under small sample size conditions. We tackle this problem through the use of a shrinkage estimator that offers a compromise between the sample covariance matrix and a well-conditioned matrix (also known as the "target") with the aim of minimizing the mean-squared error (MSE). We propose a distribution-free kernel matrix regularization approach that is tuned directly from the kernel matrix, avoiding the need to address the feature space explicitly. Numerical simulations demonstrate that the proposed regularization is effective in classification tasks.

Keywords