Large scale data mining, information retrieval and parallel processing systems.
Gyanit Singh is currently a researcher at eBay Research Labs. He completed his masters in computer science from university of Washington, Seattle. Before that he was at Indian institute of technology Delhi pursuing his bachelors in computer science.
A Study of Query Term Deletion using Large-scale E-commerce Search Logs
In ECIR 2014 (To Appear)
Query term deletion is one of the commonly used strategies for query rewriting. In this paper, we study the problem of query term deletion using large-scale e-commerce search logs. Especially we focus on queries that do not lead to user clicks and aim to predict a reduced and better query that can lead to clicks by term deletion. Accurate prediction of term deletion can potentially help users recover from poor search results and improve shopping experience. To achieve this,we use various term-dependent and query-dependent measures as features and build a classifier to predict which term is the most likely to be deleted from a given query. Different from previous work on query term deletion, we compute the features not only based on the query history and the available document collection, but also conditioned on the query category, which captures the high-level context of the query. We validate our approach using a large collection of query sessions logs from a leading e-commerce site, and show that it provides promising performance in term deletion prediction, and significantly outperforms baselines that rely on query history and corpus-based statistics without incorporating the query context information.
In proceedings of the Workshop on Log-based Personalization (the 4th WSCD workshop) at WSDM 2014.
Personalization offers the promise of improving online search and shopping experience. In this work, we perform a large scale analysis on the sample of eBay query logs, which involves 9.24 billion session data spanning 12 months (08/2012-07/2013) and address the following topics (1) What user information is useful for personalization; (2) Importance of per-query personalization (3) Importance of recency in query prediction. In this paper, we study these problems and provide some preliminary conclusions.
Chapter 20: Volume 31: Handbook of Statistics, 1st Edition : Machine Learning : Theory and Applications
2013 IEEE International Conference on Big Data (IEEE Big Data 2013 Tutorial)
This tutorial will summarize state-of-the-art approaches in the growing area of large scale click-stream mining. It will give an opportunity to data scientists, researchers and engineers with diverse backgrounds to familiarize themselves with practical platforms, approaches and tools for extracting actionable insights and building products from big and diverse data sources. The organizers will accomplish this goal using three real-life stories from the field (large scale data initiatives at eBay – one of the world’s largest e-commerce platforms). The tutorial will feature transaction mining, behavior log mining and time-series mining. We will talk about building robust recommendation systems over map reduce clusters (query suggestions, shipping fee recommendations). Talk will also include topics like user bias removal from data, using heuristics to make intractable algorithms practical and appropriate de-noising and normalization of diverse data-sets. Audience is expected to be familiar with map-reduce (preferably Hadoop). Audience is also expected to be working or grappling with data problems. Some basic background in algorithms, statistics would be beneficial. We will present the tutorial through real applications built at eBay. We will present three case studies. • Shipping Recommendation System • Mining large-scale temporal dynamics with Hadoop • Query Suggestions at scale with Hadoop:
In proceedings of The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013. 829-836. (Best Paper Award Winner)
The popularity of social media sites like Twitter and Facebook opens up interesting research opportunities for understanding the interplay of social media and e-commerce. Most research on online behavior, up until recently, has focused mostly on social media behaviors and e-commerce behaviors independently. In our study we choose a particular global ecommerce platform (eBay) and a particular global social media platform (Twitter). We quantify the characteristics of the two individual trends as well as the correlations between them. We provide evidences that about 5% of general eBay query streams show strong positive correlations with the corresponding Twitter mention streams, while the percentage jumps to around 25% for trending eBay query streams. Some categories of eBay queries, such as 'Video Games' and 'Sports', are more likely to have strong correlations. We also discover that eBay trend lags Twitter for correlated pairs and the lag differs across categories. We show evidences that celebrities' popularities on Twitter correlate well with their relevant search and sales on eBay. The correlations and lags provide predictive insights for future applications that might lead to instant merchandising opportunities for both sellers and e-commerce platforms.
Rewriting null e-commerce queries to recommend products
WWW (Companion Volume) 2012: 73-82
In e-commerce applications product descriptions are often concise. E-Commerce search engines often have to deal with queries that cannot be easily matched to product inventory resulting in zero recall or null query situations. Null queries arise from differences in buyer and seller vocabulary or from the transient nature of products. In this paper, we describe a system that rewrites null e-commerce queries to find matching products as close to the original query as possible. The system uses query relaxation to rewrite null queries in order to match products. Using eBay as an example of a dynamic marketplace, we show how using temporal feedback that respects product category structure using the repository of expired products, we improve the quality of recommended results. The system is scalable and can be run in a high volume setting. We show through our experiments that high quality product recommendations for more than 25% of null queries are achievable.
Query suggestion for E-commerce sites
WSDM 2011: 765-774, Hong Kong, February 2011
Query suggestion module is an integral part of every search engine. It helps search engine users narrow or broaden their searches. Published work on query suggestion methods has mainly focused on the web domain. But, the module is also popular in the domain of e-commerce for product search. In this paper, we discuss query suggestion and its methodologies in the context of e-commerce search engines. We show that dynamic inventory combined with long and sparse tail of query distribution poses unique challenges to build a query suggestion method for an e-commerce marketplace. We compare and contrast the design of a query suggestion system for web search engines and e-commerce search engines. Further, we discuss interesting measures to quantify the effectiveness of our query suggestion methodologies. We also describe the learning gained from exposing our query suggestion module to a vibrant community of millions of users.
User behavior in zero-recall ecommerce queries
SIGIR 2011: 75-84
User expectation and experience for web search and eCommerce (product) search are quite different. Product descriptions are concise as compared to typical web documents. User expectation is more specific to find the right product. The difference in the publisher and searcher vocabulary (in case of product search the seller and the buyer vocabulary) combined with the fact that there are fewer products to search over than web documents result in observable numbers of searches that return no results (zero recall searches). In this paper we describe a study of zero recall searches. Our study is focused on eCommerce search and uses data from a leading eCommerce site's user click stream logs. There are 3 main contributions of our study: 1) The cause of zero recall searches; 2) A study of user's reaction and recovery from zero recall; 3) A study of differences in behavior of power users versus novice users to zero recall searches.