We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.


Technology & Policy, 09/2010, Volume 23, Issue 3, p.311-331, 2010

General and Familiar Trust in Websites Knowledge

Coye Cheshire, Judd Antin, Karen Cook, Elizabeth Churchill, Coye Cheshire, Judd Antin, Karen Cook, Elizabeth Churchill

When people rely on the web to gather and distribute information, they can build a sense of trust in the websites with which they interact. Understanding the correlates of trust in most websites (general website trust) and trust in websites that one frequently visits (familiar website trust) is crucial for constructing better models of risk perception and online behavior.

We conducted an online survey of active Internet users and examined the associations between the two types of web trust and several independent factors: information technology competence, adverse online events, and general dispositions to be trusting or cautious of others.

Using a series of nested ordered logistic regression models, we find positive associations between general trust, general caution, and the two types of web trust.

The positive effect of information technology competence erases the effect of general caution for general website trust but not for familiar website trust, providing evidence that general trust and self-reported competence are stronger associates of general website trust than broad attitudes about prudence. Finally, the experience of an adverse online event has a strong, negative association with general website trust, but not with familiar website trust.

We discuss several implications for online behavior and suggest website policies that can help users make informed decisions about interacting with potentially risky websites.

PODC 2010

Efficient distributed random walks with applications

Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali, Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali

No Information

WSDM 2010 (Invited to TIST Special Issue)

Ranking mechanisms in twitter-like forums

Anish DasSarma, Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy, Anish DasSarma, Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy

We study the problem of designing a mechanism to rank items in forums by making use of the user reviews such as thumb and star ratings. We compare mechanisms where forum users rate individual posts and also mechanisms where the user is asked to perform a pairwise comparison and state which one is better.

The main metric used to evaluate a mechanism is the ranking accuracy vs the cost of reviews, where the cost is measured as the average number of reviews used per post. We show that for many reasonable probability models, there is no thumb (or star) based ranking mechanism that can produce approximately accurate rankings with bounded number of reviews per item.

On the other hand we provide a review mechanism based on pairwise comparisons which achieves approximate rankings with bounded cost. We have implemented a system, shoutvelocity, which is a twitter-like forum but items (i.e., tweets in Twitter) are rated by using comparisons. For each new item the user who posts the item is required to compare two previous entries.

This ensures that over a sequence of n posts, we get at least n comparisons requiring one review per item on average. Our mechanism uses this sequence of comparisons to obtain a ranking estimate. It ensures that every item is reviewed at least once and winning entries are reviewed more often to obtain better estimates of top items.

In IEEE VisWeek Discovery Exhibition, SALT LAKE CITY, UTAH, USA, 2010

Trail Explorer: Understanding User Experience in Webpage Flows

Zeqian Shen, Neel Sundaresan, Zeqian Shen, Neel Sundaresan

Trail Explorer is a visual analytics tool for better underst anding of user experiences in webpage flows. It enables exploration and discovery of user session data. This paper presents two case studies of Trail Explorer in use with real data.

PVLDB 2010

Regret-Minimizing Representative Databases

Atish Das Sarma, Danupon Nanongkai, Ashwin Lall, Richard J.Lipton, Jim Xu, Atish Das Sarma, Danupon Nanongkai, Ashwin Lall, Richard J.Lipton, Jim Xu

We propose the k-representative regret minimization query (k-regret) as an operation to support multi-criteria decision making. Like top-k, the k-regret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions.

Like skyline, it filters out a set of interesting points from a potentially large database based on the users' criteria; however, it never overwhelms the users by outputting too many tuples.

In particular, for any number k and any class of utility functions, the k-regret query outputs k tuples from the database and tries to minimize the maximum regret ratio. This captures how disappointed a user could be had she seen k representative tuples instead of the whole database. We focus on the class of linear utility functions, which is widely applicable.

The first challenge of this approach is that it is not clear if the maximum regret ratio would be small, or even bounded. We answer this question affirmatively. Theoretically, we prove that the maximum regret ratio can be bounded and this bound is independent of the database size.

Moreover, our extensive experiments on real and synthetic datasets suggest that in practice the maximum regret ratio is reasonably small. Additionally, algorithms developed in this paper are practical as they run in linear time in the size of the database and the experiments show that their running time is small when they run on top of the skyline operation which means that these algorithm could be integrated into current database systems.

WSDM 2010

A sketch-based distance oracle for web-scale graphs

Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, Rina Panigrahy, Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, Rina Panigrahy

We study the fundamental problem of computing distances between nodes in large graphs such as the web graph and social networks. Our objective is to be able to answer distance queries between pairs of nodes in real time.

Since the standard shortest path algorithms are expensive, our approach moves the time-consuming shortest-path computation offline, and at query time only looks up precomputed values and performs simple and fast computations on these precomputed values.

More specifically, during the offline phase we compute and store a small "sketch" for each node in the graph, and at query-time we look up the sketches of the source and destination nodes and perform a simple computation using these two sketches to estimate the distance.

In 12th International Workshop on Agent Mediated Electronic Commerce (AMEC-10) Toronto, Canada, May 2010

Modeling Seller Listing Strategies

Quang Duong, Neel Sundaresan, Zeqian Shen

Online markets have enjoyed explosive growths and emerged as an important research topic in the field of electronic commerce. Researchers have mostly focused on studying consumer behavior and experience, while largely neglecting the seller side of these markets.

Our research addresses the problem of examining strategies sellers employ in listing their products on online market places. In particular, we introduce a Markov Chain model that captures and predicts seller listing behavior based on their present and past actions, their relative positions in the market, and market conditions. These features distinguish our approach from existing models that usually overlook the importance of historical information, as well as sellers’ interactions.

We choose to examine successful sellers on eBay, one of the most prominent online marketplaces, and empirically test our model framework using eBay’s data for fixed-priced items collected over a period of four and a half months.

This empirical study entails comparing our most complex history-dependent model’s predictive power against that of a semi-random behavior baseline model and our own history-independent model. The outcomes exhibit differences between different sellers in their listing strategies for different products, and validate our models’ capability in capturing seller behavior. Furthermore, the incorporation of historical information on seller actions in our model proves to improve its predictions of future behavior

TCS 2011 (through invitation from best papers of TAMC 2009)

Best-order streaming model

Atish Das Sarma, Richard J.Lipton, Danupon Nanongkai

We study a new model of computation, called best-order stream, for graph problems. Roughly, it is a proof system where a space-limited verifier has to verify a proof sequentially (i.e., it reads the proof as a stream). Moreover, the proof itself is just a specific ordering of the input data.

This model is closely related to many models of computation in other areas such as data streams, communication complexity, and proof checking, and could be used in applications such as cloud computing.

In this paper we focus on graph problems where the input is a sequence of edges. We show that even under this model, checking some basic graph properties deterministically requires linear space in the number of nodes. We also show that, in contrast with this, randomized verifiers are powerful enough to check many graph properties in polylogarithmic space.

Neural Information Processing Systems (NIPS), 2009

Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum

Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC.

However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL).

Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data.

Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain.