Query term deletion is one of the commonly used strategies for query rewriting. In this paper, we study the problem of query term deletion using large-scale e-commerce search logs. Especially we focus on queries that do not lead to user clicks and aim to predict a reduced and better query that can lead to clicks by term deletion. Accurate prediction of term deletion can potentially help users recover from poor search results and improve shopping experience.
To achieve this,we use various term-dependent and query-dependent measures as features and build a classifier to predict which term is the most likely to be deleted from a given query. Different from previous work on query term deletion, we compute the features not only based on the query history and the available document collection, but also conditioned on the query category, which captures the high-level context of the query.
We validate our approach using a large collection of query sessions logs from a leading e-commerce site, and show that it provides promising performance in term deletion prediction, and significantly outperforms baselines that rely on query history and corpus-based statistics without incorporating the query context information.
We describe a completely automated large scale visual recommendation system for fashion. Existing approaches have primarily relied on purely computational models to solving this problem that ignore the role of users in the system.
In this paper, we propose to overcome this limitation by incorporating a user-centric design of visual fashion recommendations. Specifically, we propose a technique that augments 'user preferences' in models by exploiting elasticity in fashion choices. We further design a user study on these choices and gather results from the 'wisdom of crowd' for deeper analysis.
Our key insights learnt through these results suggest that fashion preferences when constrained to a particular class, contain important behavioral signals that are often ignored in recommendation design.
Further, presence of such classes also reflect strong correlations to visual perception which can be utilized to provide aesthetically pleasing user experiences. Finally, we illustrate that user approval of visual fashion recommendations can be substantially improved by carefully incorporating these user-centric feedback into the system framework.
The focus of this tutorial will be e-commerce product search. Several research challenges appear in this context, both from a research standpoint as well as an application standpoint. We will present various approaches adopted in the industry,
review well-known research techniques developed over the last decade, draw parallels to traditional web search highlighting the new challenges in this setting, and dig deep into some of the algorithmic and technical approaches developed for this context.
A specific approach that will involve a deep dive into literature, theoretical techniques, and practical impact is that of identifying most suited results quickly from a large database, with settings various across cold start users, and those for whom personalization is possible.
In this context, top-k and skylines will be discussed specifically as they form a key approach that spans the web, data mining, and database communities and presents a powerful tool for search across multi-dimensional items with clear preferences within each attribute, like product search as opposed to regular web search.
Journal of the Association for Computing Machinery (JACM) – 2013
Distributed Random Walks
Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali
Performing random walks in networks is a fundamental primitive that has found applications in many areas of computer science, including distributed computing. In this article, we focus on the problem of sampling random walks efficiently in a distributed network and its applications. Given bandwidth constraints, the goal is to minimize the number of rounds required to obtain random walk samples. All previous algorithms that compute a random walk sample of length ℓ as a subroutine always do so naively, that is, in O(ℓ) rounds.
The main contribution of this article is a fast distributed algorithm for performing random walks. We present a sublinear time distributed algorithm for performing random walks whose time complexity is sublinear in the length of the walk. Our algorithm performs a random walk of length ℓ in Õ(√ℓD) rounds (Õ hides polylog n factors where n is the number of nodes in the network) with high probability on an undirected network, where D is the diameter of the network. For small diameter graphs, this is a significant improvement over the naive O(ℓ) bound.
Furthermore, our algorithm is optimal within a poly-logarithmic factor as there exists a matching lower bound [Nanongkai et al. 2011]. We further extend our algorithms to efficiently perform k independent random walks in Õ(√kℓD + k) rounds. We also show that our algorithm can be applied to speedup the more general Metropolis-Hastings sampling. Our random-walk algorithms can be used to speed up distributed algorithms in applications that use random walks as a subroutine. We present two main applications.
First, we give a fast distributed algorithm for computing a random spanning tree (RST) in an arbitrary (undirected unweighted) network which runs in Õ(√mD) rounds with high probability (m is the number of edges). Our second application is a fast decentralized algorithm for estimating mixing time and related parameters of the underlying network. Our algorithm is fully decentralized and can serve as a building block in the design of topologically-aware networks.
Shaomei Wu, Atish Das Sarma, Alex Fabrikant, Silvio Lattanzi, Andrew Tomkins
In this paper, we consider the natural arrival and departure of users in a social network, and ask whether the dynamics of arrival, which have been studied in some depth, also explain the dynamics of departure, which are not as well studied.
Through study of the DBLP co-authorship network and a large online social network, we show that the dynamics of departure behave differently from the dynamics of formation.
In particular, the probability of departure of a user with few friends may be understood most accurately as a function of the raw number of friends who are active. For users with more friends, however, the probability of departure is best predicted by the overall fraction of the user's neighborhood that is active, independent of size.
We then study global properties of the sub-graphs induced by active and inactive users, and show that active users tend to belong to a core that is densifying and is significantly denser than the inactive users. Further, the inactive set of users exhibit a higher density and lower conductance than the degree distribution alone can explain. These two aspects suggest that nodes at the fringe are more likely to depart and subsequent departure are correlated among neighboring nodes in tightly-knit communities.
Unsupervised models can provide supplementary soft constraints to help classify new target data under the assumption that similar objects in the target set are more likely to share the same class label. Such models can also help detect possible dierences between training and target distributions,
which is useful in applications where concept drift may take place. This paper describes a Bayesian frame work that takes as input class labels from existing classefiers (designed based on labeled data from the source domain),
as well as cluster labels from a cluster ensemble operating solely on the target data to be classified and yields a con-ensus labeling of the target data. This framework is particularly useful when the statistics of the target data drift or change from those of the training data.
We also show that the proposed framework is privacy-aware and allows performing distributed learning when data/models have sharing restrictions. Experiments show that our framework can yield superior results to those provided by applying classifier ensembles only.