Information retrieval and search, recommender systems, data mining (big data analytics, user behavior analysis, query log mining)
Nish joined eBay Research Labs in February 2008. At eBay Research Labs, he works on query analysis, recommender systems and large-scale data processing. Prior to joining eBay Research Labs he was part of the team that launched eBay's Next Generation Search Engine Voyager which supports near real-time indexing of products and serves more than 250M queries a day. Prior to joining eBay, Nish received an M.S. in Computer Science from University of Southern California and a B.S. in Electrical Engineering from Gujarat University where he was awarded a gold medal for academic excellence.
In proceedings of the Workshop on Log-based Personalization (the 4th WSCD workshop) at WSDM 2014.
Personalization offers the promise of improving online search and shopping experience. In this work, we perform a large scale analysis on the sample of eBay query logs, which involves 9.24 billion session data spanning 12 months (08/2012-07/2013) and address the following topics (1) What user information is useful for personalization; (2) Importance of per-query personalization (3) Importance of recency in query prediction. In this paper, we study these problems and provide some preliminary conclusions.
In ECIR 2014 (To Appear)
Query term deletion is one of the commonly used strategies for query rewriting. In this paper, we study the problem of query term deletion using large-scale e-commerce search logs. Especially we focus on queries that do not lead to user clicks and aim to predict a reduced and better query that can lead to clicks by term deletion. Accurate prediction of term deletion can potentially help users recover from poor search results and improve shopping experience. To achieve this,we use various term-dependent and query-dependent measures as features and build a classifier to predict which term is the most likely to be deleted from a given query. Different from previous work on query term deletion, we compute the features not only based on the query history and the available document collection, but also conditioned on the query category, which captures the high-level context of the query. We validate our approach using a large collection of query sessions logs from a leading e-commerce site, and show that it provides promising performance in term deletion prediction, and significantly outperforms baselines that rely on query history and corpus-based statistics without incorporating the query context information.
Tutorial at WWW-2014
The focus of this tutorial will be e-commerce product search. Several research challenges appear in this context, both from a research standpoint as well as an application standpoint. We will present various approaches adopted in the industry, review well-known research techniques developed over the last decade, draw parallels to traditional web search highlighting the new challenges in this setting, and dig deep into some of the algorithmic and technical approaches developed for this context. A specific approach that will involve a deep dive into literature, theoretical techniques, and practical impact is that of identifying most suited results quickly from a large database, with settings various across cold start users, and those for whom personalization is possible. In this context, top-k and skylines will be discussed specifically as they form a key approach that spans the web, data mining, and database communities and presents a powerful tool for search across multi-dimensional items with clear preferences within each attribute, like product search as opposed to regular web search.
Categories: Human Computer Interaction
Lightning Talk and Poster @ Extremely Large Databases XLDB 2013.
We present a method for developing a network of e-commerce concepts. We define concepts as collection of terms that represent product entities or commerce ideas that users are interested in. We start by looking at large corpora (Billions) of historical eBay buyer queries and seller item titles. We approach the problem of concept extraction from corpora as a market-baskets problem by adapting statistical measures of support and confidence. The concept-centric meta-data extraction pipeline is built over a map-reduce framework. We constrain the concepts to be both popular and concise. Evaluation of our algorithm shows that high precision concept sets can be automatically mined. The system mines the full spectrum of precise e-commerce concepts ranging all the way from "ipod nano" to "I'm not a plastic bag" and from "wakizashi sword" to "mastodon skeleton". Once the concepts are detected, they are linked into a network using different metrics of semantic similarity between concepts. This leads to a rich network of e-commerce vocabulary. Such a network of concepts can be the basis of enabling powerful applications like e-commerce search and discover as well as automatic e-commerce taxonomy generation. We present details about the extraction platform, and algorithms for segmentation of short snippets of e-commerce text as well as detection and linking of concepts.
In proceedings of The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013. 829-836. (Best Paper Award Winner)
The popularity of social media sites like Twitter and Facebook opens up interesting research opportunities for understanding the interplay of social media and e-commerce. Most research on online behavior, up until recently, has focused mostly on social media behaviors and e-commerce behaviors independently. In our study we choose a particular global ecommerce platform (eBay) and a particular global social media platform (Twitter). We quantify the characteristics of the two individual trends as well as the correlations between them. We provide evidences that about 5% of general eBay query streams show strong positive correlations with the corresponding Twitter mention streams, while the percentage jumps to around 25% for trending eBay query streams. Some categories of eBay queries, such as 'Video Games' and 'Sports', are more likely to have strong correlations. We also discover that eBay trend lags Twitter for correlated pairs and the lag differs across categories. We show evidences that celebrities' popularities on Twitter correlate well with their relevant search and sales on eBay. The correlations and lags provide predictive insights for future applications that might lead to instant merchandising opportunities for both sellers and e-commerce platforms.
2013 IEEE International Conference on Big Data (IEEE Big Data 2013 Tutorial)
This tutorial will summarize state-of-the-art approaches in the growing area of large scale click-stream mining. It will give an opportunity to data scientists, researchers and engineers with diverse backgrounds to familiarize themselves with practical platforms, approaches and tools for extracting actionable insights and building products from big and diverse data sources. The organizers will accomplish this goal using three real-life stories from the field (large scale data initiatives at eBay – one of the world’s largest e-commerce platforms). The tutorial will feature transaction mining, behavior log mining and time-series mining. We will talk about building robust recommendation systems over map reduce clusters (query suggestions, shipping fee recommendations). Talk will also include topics like user bias removal from data, using heuristics to make intractable algorithms practical and appropriate de-noising and normalization of diverse data-sets. Audience is expected to be familiar with map-reduce (preferably Hadoop). Audience is also expected to be working or grappling with data problems. Some basic background in algorithms, statistics would be beneficial. We will present the tutorial through real applications built at eBay. We will present three case studies. • Shipping Recommendation System • Mining large-scale temporal dynamics with Hadoop • Query Suggestions at scale with Hadoop:
CIKM ’13 Proceedings of the 22nd ACM international conference on Conference on information & knowledge management Pages 1137-1146
In this paper, we present QSEGMENT, a real-life query segmentation system for eCommerce queries. QSEGMENT uses frequency data from the query log which we call buyers′ data and also frequency data from product titles what we call sellers′ data. We exploit the taxonomical structure of the marketplace to build domain speciﬁc frequency models. Using such an approach, QSEGMENT performs better than previously described baselines for query segmentation. Also, we perform a large scale evaluation by using an unsupervised IR metric which we refer to as user-intent-score. We discuss the overall architecture of QSEGMENT as well as various use cases and interesting observations around segmenting eCommerce queries.
Tutorial at CIKM-2013
Skyline queries have been a topic of intense study in the database area for over a decade now. Similarly, the top-k retrieval query has been heavily investigated by both the database as well as the web research communities. This tutorial will delve into the background of these two areas, and specifically focus on the recent challenges with respect to returning a small set of results to users, as well as requiring minimal intervention or input from them. These are two main concerns with skylines and top-k respectively, and therefore have drawn a great deal of attention in the recent years with several interesting ideas being proposed in the research community. This tutorial will cover the current approaches to representative database selection. We will focus on both the theoretical models as well as the practical aspects from an industry standpoint. The topics of covered in this tutorial will include identifying representative subsets of the skyline set, interaction based approaches, e-commerce product search, and leveraging aggregate user preference statistics.
Categories: Human Computer Interaction
Chapter 20: Volume 31: Handbook of Statistics, 1st Edition : Machine Learning : Theory and Applications
WWW (Companion Volume) 2012: 73-82
In e-commerce applications product descriptions are often concise. E-Commerce search engines often have to deal with queries that cannot be easily matched to product inventory resulting in zero recall or null query situations. Null queries arise from differences in buyer and seller vocabulary or from the transient nature of products. In this paper, we describe a system that rewrites null e-commerce queries to find matching products as close to the original query as possible. The system uses query relaxation to rewrite null queries in order to match products. Using eBay as an example of a dynamic marketplace, we show how using temporal feedback that respects product category structure using the repository of expired products, we improve the quality of recommended results. The system is scalable and can be run in a high volume setting. We show through our experiments that high quality product recommendations for more than 25% of null queries are achievable.