Gyanit Singh

Gyanit Singh
Data mining (big data analytics, user behavior analysis, query log mining), Information Retrieval and Search, Algorithms.

Gyanit joined eBay Research Labs in Fall 2008. At eBay Research Labs, he works on click-stream mining, large-scale data processing, visualizations and algorithms. Prior to joining eBay Research Labs he received M.S. in Computer Science from University of Washington, Seattle and a B.Tech. in Computer Science from Indian Institute of Technology, Delhi.

WSDM 2011: 765-774, Hong Kong, February 2011

Query suggestion for E-commerce sites

Query suggestion module is an integral part of every search engine. It helps search engine users narrow or broaden their searches. Published work on query suggestion methods has mainly focused on the web domain. But, the module is also popular in the domain of e-commerce for product search.

In this paper, we discuss query suggestion and its methodologies in the context of e-commerce search engines. We show that dynamic inventory combined with long and sparse tail of query distribution poses unique challenges to build a query suggestion method for an e-commerce marketplace.

We compare and contrast the design of a query suggestion system for web search engines and e-commerce search engines. Further, we discuss interesting measures to quantify the effectiveness of our query suggestion methodologies. We also describe the learning gained from exposing our query suggestion module to a vibrant community of millions of users.

WWW (Companion Volume) 2012: 73-82

Rewriting null e-commerce queries to recommend products

In e-commerce applications product descriptions are often concise. E-Commerce search engines often have to deal with queries that cannot be easily matched to product inventory resulting in zero recall or null query situations.

Null queries arise from differences in buyer and seller vocabulary or from the transient nature of products. In this paper, we describe a system that rewrites null e-commerce queries to find matching products as close to the original query as possible.

The system uses query relaxation to rewrite null queries in order to match products. Using eBay as an example of a dynamic marketplace, we show how using temporal feedback that respects product category structure using the repository of expired products, we improve the quality of recommended results.

The system is scalable and can be run in a high volume setting. We show through our experiments that high quality product recommendations for more than 25% of null queries are achievable.

2013 IEEE International Conference on Big Data (IEEE Big Data 2013 Tutorial)

Large Scale Click-Stream and Transaction Log Mining In Practice

This tutorial will summarize state-of-the-art approaches in the growing area of large scale click-stream mining. It will give an opportunity to data scientists, researchers and engineers with diverse backgrounds to familiarize themselves with practical platforms, approaches and tools for extracting actionable insights and building products from big and diverse data sources.

The organizers will accomplish this goal using three real-life stories from the field (large scale data initiatives at eBay – one of the world’s largest e-commerce platforms).

The tutorial will feature transaction mining, behavior log mining and time-series mining. We will talk about building robust recommendation systems over map reduce clusters (query suggestions, shipping fee recommendations).

Talk will also include topics like user bias removal from data, using heuristics to make intractable algorithms practical and appropriate de-noising and normalization of diverse data-sets. Audience is expected to be familiar with map-reduce (preferably Hadoop). Audience is also expected to be working or grappling with data problems.

Some basic background in algorithms, statistics would be beneficial. We will present the tutorial through real applications built at eBay. We will present three case studies. • Shipping Recommendation System • Mining large-scale temporal dynamics with Hadoop • Query Suggestions at scale with Hadoop:

In proceedings of The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013. 829-836. (Best Paper Award Winner)

Chelsea Won, and You Bought a T-shirt: Characterizing the Interplay Between Twitter and E-Commerce

The popularity of social media sites like Twitter and Facebook opens up interesting research opportunities for understanding the interplay of social media and e-commerce. Most research on online behavior, up until recently, has focused mostly on social media behaviors and e-commerce behaviors independently.

In our study we choose a particular global ecommerce platform (eBay) and a particular global social media platform (Twitter). We quantify the characteristics of the two individual trends as well as the correlations between them.

We provide evidences that about 5% of general eBay query streams show strong positive correlations with the corresponding Twitter mention streams, while the percentage jumps to around 25% for trending eBay query streams. Some categories of eBay queries, such as 'Video Games' and 'Sports', are more likely to have strong correlations.

We also discover that eBay trend lags Twitter for correlated pairs and the lag differs across categories. We show evidences that celebrities' popularities on Twitter correlate well with their relevant search and sales on eBay.

The correlations and lags provide predictive insights for future applications that might lead to instant merchandising opportunities for both sellers and e-commerce platforms.

In ECIR 2014 (To Appear)

A Study of Query Term Deletion using Large-scale E-commerce Search Logs

Query term deletion is one of the commonly used strategies for query rewriting. In this paper, we study the problem of query term deletion using large-scale e-commerce search logs. Especially we focus on queries that do not lead to user clicks and aim to predict a reduced and better query that can lead to clicks by term deletion. Accurate prediction of term deletion can potentially help users recover from poor search results and improve shopping experience.

To achieve this,we use various term-dependent and query-dependent measures as features and build a classifier to predict which term is the most likely to be deleted from a given query. Different from previous work on query term deletion, we compute the features not only based on the query history and the available document collection, but also conditioned on the query category, which captures the high-level context of the query.

We validate our approach using a large collection of query sessions logs from a leading e-commerce site, and show that it provides promising performance in term deletion prediction, and significantly outperforms baselines that rely on query history and corpus-based statistics without incorporating the query context information.

In proceedings of the Workshop on Log-based Personalization (the 4th WSCD workshop) at WSDM 2014

A Large Scale Query Logs Analysis for Assessing Personalization Opportunities in E-commerce Sites

Personalization offers the promise of improving online search and shopping experience. In this work, we perform a large scale analysis on the sample of eBay query logs, which involves 9.24 billion session data spanning 12 months (08/2012-07/2013) and address the following topics

(1) What user information is useful for personalization;

(2) Importance of per-query personalization

(3) Importance of recency in query prediction.

In this paper, we study these problems and provide some preliminary conclusions

Thursday, December 2, 2010

Distributed steam processing

Sunday, September 11, 2011

Parallel data stream processing system

Thursday, March 28, 2013

Recommendations for search queries

Tuesday, January 27, 2015

Query suggestion for e-commerce sites