Gyanit joined eBay Research Labs in Fall 2008. At eBay Research Labs, he works on click-stream mining, large-scale data processing, visualizations and algorithms. Prior to joining eBay Research Labs he received M.S. in Computer Science from University of Washington, Seattle and a B.Tech. in Computer Science from Indian Institute of Technology, Delhi.
Query suggestion module is an integral part of every search engine. It helps search engine users narrow or broaden their searches. Published work on query suggestion methods has mainly focused on the web domain. But, the module is also popular in the domain of e-commerce for product search.
In this paper, we discuss query suggestion and its methodologies in the context of e-commerce search engines. We show that dynamic inventory combined with long and sparse tail of query distribution poses unique challenges to build a query suggestion method for an e-commerce marketplace.
We compare and contrast the design of a query suggestion system for web search engines and e-commerce search engines. Further, we discuss interesting measures to quantify the effectiveness of our query suggestion methodologies. We also describe the learning gained from exposing our query suggestion module to a vibrant community of millions of users.
In e-commerce applications product descriptions are often concise. E-Commerce search engines often have to deal with queries that cannot be easily matched to product inventory resulting in zero recall or null query situations.
Null queries arise from differences in buyer and seller vocabulary or from the transient nature of products. In this paper, we describe a system that rewrites null e-commerce queries to find matching products as close to the original query as possible.
The system uses query relaxation to rewrite null queries in order to match products. Using eBay as an example of a dynamic marketplace, we show how using temporal feedback that respects product category structure using the repository of expired products, we improve the quality of recommended results.
The system is scalable and can be run in a high volume setting. We show through our experiments that high quality product recommendations for more than 25% of null queries are achievable.
This tutorial will summarize state-of-the-art approaches in the growing area of large scale click-stream mining. It will give an opportunity to data scientists, researchers and engineers with diverse backgrounds to familiarize themselves with practical platforms, approaches and tools for extracting actionable insights and building products from big and diverse data sources.
The organizers will accomplish this goal using three real-life stories from the field (large scale data initiatives at eBay – one of the world’s largest e-commerce platforms).
The tutorial will feature transaction mining, behavior log mining and time-series mining. We will talk about building robust recommendation systems over map reduce clusters (query suggestions, shipping fee recommendations).
Talk will also include topics like user bias removal from data, using heuristics to make intractable algorithms practical and appropriate de-noising and normalization of diverse data-sets. Audience is expected to be familiar with map-reduce (preferably Hadoop). Audience is also expected to be working or grappling with data problems.
Some basic background in algorithms, statistics would be beneficial. We will present the tutorial through real applications built at eBay. We will present three case studies. • Shipping Recommendation System • Mining large-scale temporal dynamics with Hadoop • Query Suggestions at scale with Hadoop:
The popularity of social media sites like Twitter and Facebook opens up interesting research opportunities for understanding the interplay of social media and e-commerce. Most research on online behavior, up until recently, has focused mostly on social media behaviors and e-commerce behaviors independently.
In our study we choose a particular global ecommerce platform (eBay) and a particular global social media platform (Twitter). We quantify the characteristics of the two individual trends as well as the correlations between them.
We provide evidences that about 5% of general eBay query streams show strong positive correlations with the corresponding Twitter mention streams, while the percentage jumps to around 25% for trending eBay query streams. Some categories of eBay queries, such as 'Video Games' and 'Sports', are more likely to have strong correlations.
We also discover that eBay trend lags Twitter for correlated pairs and the lag differs across categories. We show evidences that celebrities' popularities on Twitter correlate well with their relevant search and sales on eBay.
The correlations and lags provide predictive insights for future applications that might lead to instant merchandising opportunities for both sellers and e-commerce platforms.
Query term deletion is one of the commonly used strategies for query rewriting. In this paper, we study the problem of query term deletion using large-scale e-commerce search logs. Especially we focus on queries that do not lead to user clicks and aim to predict a reduced and better query that can lead to clicks by term deletion. Accurate prediction of term deletion can potentially help users recover from poor search results and improve shopping experience.
To achieve this,we use various term-dependent and query-dependent measures as features and build a classifier to predict which term is the most likely to be deleted from a given query. Different from previous work on query term deletion, we compute the features not only based on the query history and the available document collection, but also conditioned on the query category, which captures the high-level context of the query.
We validate our approach using a large collection of query sessions logs from a leading e-commerce site, and show that it provides promising performance in term deletion prediction, and significantly outperforms baselines that rely on query history and corpus-based statistics without incorporating the query context information.
Personalization offers the promise of improving online search and shopping experience. In this work, we perform a large scale analysis on the sample of eBay query logs, which involves 9.24 billion session data spanning 12 months (08/2012-07/2013) and address the following topics
(1) What user information is useful for personalization;
(2) Importance of per-query personalization
(3) Importance of recency in query prediction.
In this paper, we study these problems and provide some preliminary conclusions