We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.


Proceedings of the Sixteenth ACM Conference on Economics and Computation (EC '15). ACM, New York, NY, USA (2015)

Canary in the e-Commerce Coal Mine: Detecting and Predicting Poor Experiences Using Buyer-to-Seller Messages

Dimitriy Masterov, Uwe Mayer, Steve Tadelis

Reputation and feedback systems in online marketplaces are often biased, making it difficult to ascertain the quality of sellers. We use post-transaction, buyer-to-seller message traffic to detect signals of unsatisfactory transactions on eBay. We posit that a message sent after the item was paid for serves as a reliable indicator that the buyer may be unhappy with that purchase, particularly when the message included words associated with a negative experience. The fraction of a seller's message traffic that was negative predicts whether a buyer who transacts with this seller will stop purchasing on eBay, implying that platforms can use these messages as an additional signal of seller quality.

In proceedings of the Workshop on Log-based Personalization (the 4th WSCD workshop) at WSDM 2014

A Large Scale Query Logs Analysis for Assessing Personalization Opportunities in E-commerce Sites

Neel Sundaresan, Zitao Liu

Personalization offers the promise of improving online search and shopping experience. In this work, we perform a large scale analysis on the sample of eBay query logs, which involves 9.24 billion session data spanning 12 months (08/2012-07/2013) and address the following topics

(1) What user information is useful for personalization;

(2) Importance of per-query personalization

(3) Importance of recency in query prediction.

In this paper, we study these problems and provide some preliminary conclusions


Physician Incentives and Treatment Choices in Heart Attack Management

Dominic Coey

We estimate how physicians’ financial incentives affect their treatment choices in heart Attack management, using a large dataset of private health insurance claims. Different insurance plans pay physicians different amounts for the same services, generating the required variation in financial incentives.

We begin by presenting evidence that, unconditionally, plans that pay physicians more for more invasive treatments are associated with a considerably larger fraction of such treatments. To interpret this correlation as causal, we continue by showing that it survives conditioning on a rich set of diagnosis and provider-specific variables.

We perform a host of additional checks verifying that differences in unobservable patient or provider characteristics across plans are unlikely to be driving our results. We find that physicians’ treatment choices respond positively to the payments they receive, and that the response is quite large.

If physicians received bundled payments instead of fee-for-service incentives, for example, heart attack management would become considerably more conservative. Our estimates imply that 20 percent of patients would receive different treatments, physician costs would decrease by 27 percent, and social welfare would increase.

in IEEE Large-scale Data Analysis and Visualization (LDAV) 2012

Visual Analysis of Massive Web Session Data

Zeqian Shen, Jishang Wei, Neel Sundaresan, Kwan-Liu Ma, Zeqian Shen, Jishang Wei, Neel Sundaresan, Kwan-Liu Ma

Tracking and recording users’ browsing behaviors on the web down to individual mouse clicks can create massive web session logs.While such web session data contain valuable information about user behaviors, the ever-increasing data size has placed a big challenge to analyzing and visualizing the data.

An efficient data analysis framework requires both powerful computational analysis and interactive visualization. Following the visual analytics mantra "Analyze first, show the important, zoom, filter and analyze further, details on demand", we introduce a two-tier visual analysis system, TrailExplorer2, to discover knowledge from massive log data.

The system supports a visual analysis process iterating between two steps: querying web sessions and visually analyzing the retrieved data. The query happens at the lower tier where terabytes of web session data are processed in a cluster.

At the upper tier, the extracted web sessions with much smaller scale are visualized on a personal computer for interactive exploration. Our system visualizes a sorted list of web sessions’ temporal patterns and enables data exploration at different levels of details.

The query visualization exploration process iterates until a satisfactory conclusion is achieved. We present two case studies of TrailExplorer2 using real world session data from eBay to demonstrate the system's effectiveness.

in IEEE Visual Analytics Science and Technology (VAST) 2012

Visual Cluster Exploration of Web Clickstream Data

Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma, Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma

Web clickstream data are routinely collected to study how users browse the web or use a service. It is clear that the ability to recognize and summarize user behavior patterns from such data is valuable to e-commerce companies. In this paper, we introduce a visual analytics system to explore the various user behavior patterns reflected by distinct clickstream clusters.

In a practical analysis scenario, the system first presents an overview of clickstream clusters using a Self-Organizing Map with Markov chain models.

Then the analyst can interactively explore the clusters through an intuitive user interface. He can either obtain summarization of a selected group of data or further refine the clustering result. We evaluated our system using two different datasets from eBay.

Analysts who were working on the same data have confirmed the system’s effectiveness in extracting user behavior patterns from complex datasets and enhancing their ability to reason.

Journal of the American Society for Information Science and Technology (JASIST), 04/2010, Volume 61, Issue 7, p.1487-1501, 2010

Behaviors, adverse events, and dispositions: An empirical study of online discretion and information control

Coye Cheshire, Judd Antin, Elizabeth Churchill, Coye Cheshire, Judd Antin, Elizabeth Churchill

The authors develop hypotheses about three key correlates of attitudes about discretionary online behaviors and control over one’s own online information: frequency of engaging in risky online behaviors, experience of an online adverse event, and the disposition to be more or less trusting and cautious of others.

Through an analysis of survey results, they find that online adverse events do not necessarily relate to greater overall Web discretion, but they do significantly associate with users’ perceptions of Web information control.

However, the frequencies with which individuals engage in risky online activities and behaviors significantly associate with both online discretion and information control. In addition, general dispositions to trust and be cautious are strongly related to prudent Internet behavior and attitudes about managing personal online information.

The results of this study have clear consequences for our understanding of behaviors and attitudes that might lead to greater online social intelligence, or the ability to make prudent decisions in the presence of Internet uncertainties and risks. Implications for theory and practice are discussed.

In IEEE VisWeek Discovery Exhibition, SALT LAKE CITY, UTAH, USA, 2010

Trail Explorer: Understanding User Experience in Webpage Flows

Zeqian Shen, Neel Sundaresan, Zeqian Shen, Neel Sundaresan

Trail Explorer is a visual analytics tool for better underst anding of user experiences in webpage flows. It enables exploration and discovery of user session data. This paper presents two case studies of Trail Explorer in use with real data.

In 12th International Workshop on Agent Mediated Electronic Commerce (AMEC-10) Toronto, Canada, May 2010

Modeling Seller Listing Strategies

Quang Duong, Neel Sundaresan, Zeqian Shen

Online markets have enjoyed explosive growths and emerged as an important research topic in the field of electronic commerce. Researchers have mostly focused on studying consumer behavior and experience, while largely neglecting the seller side of these markets.

Our research addresses the problem of examining strategies sellers employ in listing their products on online market places. In particular, we introduce a Markov Chain model that captures and predicts seller listing behavior based on their present and past actions, their relative positions in the market, and market conditions. These features distinguish our approach from existing models that usually overlook the importance of historical information, as well as sellers’ interactions.

We choose to examine successful sellers on eBay, one of the most prominent online marketplaces, and empirically test our model framework using eBay’s data for fixed-priced items collected over a period of four and a half months.

This empirical study entails comparing our most complex history-dependent model’s predictive power against that of a semi-random behavior baseline model and our own history-independent model. The outcomes exhibit differences between different sellers in their listing strategies for different products, and validate our models’ capability in capturing seller behavior. Furthermore, the incorporation of historical information on seller actions in our model proves to improve its predictions of future behavior

In Proceedings of Eurographics/IEEE VGTC Syposium on Visualization, May 2007, pp. 83-90

Path Visualization for Adjacency Matrices

Zeqian Shen, Kwan-Liu Ma, Zeqian Shen, Kwan-Liu Ma

For displaying a dense graph, an adjacency matrix is superior than a node-link diagram because it is more compact and free of visual clutter. A node-link diagram, however, is far better for the task of path finding because a path can be easily traced by following the corresponding links, provided that the links are not heavily crossed or tangled.

We augment adjacency matrices with path visualization and associated interaction techniques to facilitate path finding.

Our design is visually pleasing, and also effectively displays multiple paths based on the design commonly found in metro maps. We illustrate and assess the key aspects of our design with the results obtained from two case studies and an informal user study.