We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.


Computer Supported Cooperative Work, Volume 20, p.497-528, 2011

Computer Interaction Analysis: Toward an Empirical Approach to Understanding User Practice and Eye Gaze in GUI-Based Interaction

Robert Moore, Elizabeth Churchill

Today's personal computers enable complex forms of user interaction. Unlike older mainframe computers that required batch processing, personal computers enable real-time user control on a one-to-one basis.

Such user interaction involves mixed initiative, logic, language and pointing gestures, features reminiscent of interaction with another human. Yet there are also major differences between computer interaction and human interaction, such as computers' inability to stray from scripts or to adapt to the idiosyncrasies of particular recipients or situations.

Given these similarities and differences, can we study computer interaction using methods similar to those for studying human interaction? If so, are the findings from the analysis of human interaction also useful in understanding computer interaction?

In this paper, we explore these questions and outline a novel methodological approach for examining human-computer interaction, which we call "computer interaction analysis." We build on earlier approaches to human interaction with a computer and adapt them to the latest technologies for computer screen capture and eye tracking.

In doing so, we propose a new transcription notation scheme that is designed to represent the interweaving streams of input actions, display events and eye movements. Finally we demonstrate the approach with concrete examples involving the phenomena of placeholding, repair and referential practices.

Proceedings of the International Conference on Machine Learning (ICML), 2011.

Training Factor Graphs with Atomic Gradients

Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, Andrew McCallum

We present SampleRank, an alternative to contrastive divergence (CD) for estimating parameters in complex graphical models. SampleRank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain.

As a result, parameter updates can be computed between arbitrary MCMC states. SampleRank is not only faster than CD, but also achieves better accuracy in practice (up to 23% error reduction on noun-phrase coreference).

Proceedings of the fifth ACM conference on Recommender systems, 2011. [Best short paper award]

Utilizing related products for post-purchase recommendation in e-commerce

Jian Wang, Badrul Sarwar, Neel Sundaresan

No information

ICDE 2011

Representative skylines using threshold-based preference distributions

Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J.Lipton, Jim Xu, Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J.Lipton, Jim Xu

No Information

Technology & Policy, 09/2010, Volume 23, Issue 3, p.311-331, 2010

General and Familiar Trust in Websites Knowledge

Coye Cheshire, Judd Antin, Karen Cook, Elizabeth Churchill, Coye Cheshire, Judd Antin, Karen Cook, Elizabeth Churchill

When people rely on the web to gather and distribute information, they can build a sense of trust in the websites with which they interact. Understanding the correlates of trust in most websites (general website trust) and trust in websites that one frequently visits (familiar website trust) is crucial for constructing better models of risk perception and online behavior.

We conducted an online survey of active Internet users and examined the associations between the two types of web trust and several independent factors: information technology competence, adverse online events, and general dispositions to be trusting or cautious of others.

Using a series of nested ordered logistic regression models, we find positive associations between general trust, general caution, and the two types of web trust.

The positive effect of information technology competence erases the effect of general caution for general website trust but not for familiar website trust, providing evidence that general trust and self-reported competence are stronger associates of general website trust than broad attitudes about prudence. Finally, the experience of an adverse online event has a strong, negative association with general website trust, but not with familiar website trust.

We discuss several implications for online behavior and suggest website policies that can help users make informed decisions about interacting with potentially risky websites.

PODC 2010

Efficient distributed random walks with applications

Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali, Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali

No Information

WSDM 2010 (Invited to TIST Special Issue)

Ranking mechanisms in twitter-like forums

Anish DasSarma, Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy, Anish DasSarma, Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy

We study the problem of designing a mechanism to rank items in forums by making use of the user reviews such as thumb and star ratings. We compare mechanisms where forum users rate individual posts and also mechanisms where the user is asked to perform a pairwise comparison and state which one is better.

The main metric used to evaluate a mechanism is the ranking accuracy vs the cost of reviews, where the cost is measured as the average number of reviews used per post. We show that for many reasonable probability models, there is no thumb (or star) based ranking mechanism that can produce approximately accurate rankings with bounded number of reviews per item.

On the other hand we provide a review mechanism based on pairwise comparisons which achieves approximate rankings with bounded cost. We have implemented a system, shoutvelocity, which is a twitter-like forum but items (i.e., tweets in Twitter) are rated by using comparisons. For each new item the user who posts the item is required to compare two previous entries.

This ensures that over a sequence of n posts, we get at least n comparisons requiring one review per item on average. Our mechanism uses this sequence of comparisons to obtain a ranking estimate. It ensures that every item is reviewed at least once and winning entries are reviewed more often to obtain better estimates of top items.

PVLDB 2010

Regret-Minimizing Representative Databases

Atish Das Sarma, Danupon Nanongkai, Ashwin Lall, Richard J.Lipton, Jim Xu, Atish Das Sarma, Danupon Nanongkai, Ashwin Lall, Richard J.Lipton, Jim Xu

We propose the k-representative regret minimization query (k-regret) as an operation to support multi-criteria decision making. Like top-k, the k-regret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions.

Like skyline, it filters out a set of interesting points from a potentially large database based on the users' criteria; however, it never overwhelms the users by outputting too many tuples.

In particular, for any number k and any class of utility functions, the k-regret query outputs k tuples from the database and tries to minimize the maximum regret ratio. This captures how disappointed a user could be had she seen k representative tuples instead of the whole database. We focus on the class of linear utility functions, which is widely applicable.

The first challenge of this approach is that it is not clear if the maximum regret ratio would be small, or even bounded. We answer this question affirmatively. Theoretically, we prove that the maximum regret ratio can be bounded and this bound is independent of the database size.

Moreover, our extensive experiments on real and synthetic datasets suggest that in practice the maximum regret ratio is reasonably small. Additionally, algorithms developed in this paper are practical as they run in linear time in the size of the database and the experiments show that their running time is small when they run on top of the skyline operation which means that these algorithm could be integrated into current database systems.

WSDM 2010

A sketch-based distance oracle for web-scale graphs

Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, Rina Panigrahy, Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, Rina Panigrahy

We study the fundamental problem of computing distances between nodes in large graphs such as the web graph and social networks. Our objective is to be able to answer distance queries between pairs of nodes in real time.

Since the standard shortest path algorithms are expensive, our approach moves the time-consuming shortest-path computation offline, and at query time only looks up precomputed values and performs simple and fast computations on these precomputed values.

More specifically, during the offline phase we compute and store a small "sketch" for each node in the graph, and at query-time we look up the sketches of the source and destination nodes and perform a simple computation using these two sketches to estimate the distance.