We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.


Journal of the American Society for Information Science and Technology, 09/2011, 2011

Automatic identification of personal insults on social news sites

Sara OwsleySood, Elizabeth Churchill, Judd Antin

As online communities grow and the volume of user-generated content increases, the need for community management also rises. Community management has three main purposes: to create a positive experience for existing participants, to promote appropriate, socionormative behaviors, and to encourage potential participants to make contributions.

Research indicates that the quality of content a potential participant sees on a site is highly influential; off-topic, negative comments with malicious intent are a particularly strong boundary to participation or set the tone for encouraging similar contributions.

A problem for community managers, therefore, is the detection and elimination of such undesirable content. As a community grows, this undertaking becomes more daunting. Can an automated system aid community managers in this task? In this paper, we address this question through a machine learning approach to automatic detection of inappropriate negative user contributions.

Our training corpus is a set of comments from a news commenting site that we tasked Amazon Mechanical Turk workers with labeling. Each comment is labeled for the presence of profanity, insults, and the object of the insults.

Support vector machines trained on these data are combined with relevance and valence analysis systems in a multistep approach to the detection of inappropriate negative user contributions. The system shows great potential for semiautomated community management.

WSDM 2011: 765-774, Hong Kong, February 2011

Query suggestion for E-commerce sites

Mohammad AlHasan, Nish Parikh, Gyanit Singh, Neel Sundaresan

Query suggestion module is an integral part of every search engine. It helps search engine users narrow or broaden their searches. Published work on query suggestion methods has mainly focused on the web domain. But, the module is also popular in the domain of e-commerce for product search.

In this paper, we discuss query suggestion and its methodologies in the context of e-commerce search engines. We show that dynamic inventory combined with long and sparse tail of query distribution poses unique challenges to build a query suggestion method for an e-commerce marketplace.

We compare and contrast the design of a query suggestion system for web search engines and e-commerce search engines. Further, we discuss interesting measures to quantify the effectiveness of our query suggestion methodologies. We also describe the learning gained from exposing our query suggestion module to a vibrant community of millions of users.

Computer Supported Cooperative Work, Volume 20, p.497-528, 2011

Computer Interaction Analysis: Toward an Empirical Approach to Understanding User Practice and Eye Gaze in GUI-Based Interaction

Robert Moore, Elizabeth Churchill

Today's personal computers enable complex forms of user interaction. Unlike older mainframe computers that required batch processing, personal computers enable real-time user control on a one-to-one basis.

Such user interaction involves mixed initiative, logic, language and pointing gestures, features reminiscent of interaction with another human. Yet there are also major differences between computer interaction and human interaction, such as computers' inability to stray from scripts or to adapt to the idiosyncrasies of particular recipients or situations.

Given these similarities and differences, can we study computer interaction using methods similar to those for studying human interaction? If so, are the findings from the analysis of human interaction also useful in understanding computer interaction?

In this paper, we explore these questions and outline a novel methodological approach for examining human-computer interaction, which we call "computer interaction analysis." We build on earlier approaches to human interaction with a computer and adapt them to the latest technologies for computer screen capture and eye tracking.

In doing so, we propose a new transcription notation scheme that is designed to represent the interweaving streams of input actions, display events and eye movements. Finally we demonstrate the approach with concrete examples involving the phenomena of placeholding, repair and referential practices.

Proceedings of the International Conference on Machine Learning (ICML), 2011.

Training Factor Graphs with Atomic Gradients

Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, Andrew McCallum

We present SampleRank, an alternative to contrastive divergence (CD) for estimating parameters in complex graphical models. SampleRank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain.

As a result, parameter updates can be computed between arbitrary MCMC states. SampleRank is not only faster than CD, but also achieves better accuracy in practice (up to 23% error reduction on noun-phrase coreference).

Proceedings of the fifth ACM conference on Recommender systems, 2011. [Best short paper award]

Utilizing related products for post-purchase recommendation in e-commerce

Jian Wang, Badrul Sarwar, Neel Sundaresan

No information

ICDE 2011

Representative skylines using threshold-based preference distributions

Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J.Lipton, Jim Xu, Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J.Lipton, Jim Xu

No Information

Technology & Policy, 09/2010, Volume 23, Issue 3, p.311-331, 2010

General and Familiar Trust in Websites Knowledge

Coye Cheshire, Judd Antin, Karen Cook, Elizabeth Churchill, Coye Cheshire, Judd Antin, Karen Cook, Elizabeth Churchill

When people rely on the web to gather and distribute information, they can build a sense of trust in the websites with which they interact. Understanding the correlates of trust in most websites (general website trust) and trust in websites that one frequently visits (familiar website trust) is crucial for constructing better models of risk perception and online behavior.

We conducted an online survey of active Internet users and examined the associations between the two types of web trust and several independent factors: information technology competence, adverse online events, and general dispositions to be trusting or cautious of others.

Using a series of nested ordered logistic regression models, we find positive associations between general trust, general caution, and the two types of web trust.

The positive effect of information technology competence erases the effect of general caution for general website trust but not for familiar website trust, providing evidence that general trust and self-reported competence are stronger associates of general website trust than broad attitudes about prudence. Finally, the experience of an adverse online event has a strong, negative association with general website trust, but not with familiar website trust.

We discuss several implications for online behavior and suggest website policies that can help users make informed decisions about interacting with potentially risky websites.

PODC 2010

Efficient distributed random walks with applications

Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali, Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali

No Information

WSDM 2010 (Invited to TIST Special Issue)

Ranking mechanisms in twitter-like forums

Anish DasSarma, Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy, Anish DasSarma, Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy

We study the problem of designing a mechanism to rank items in forums by making use of the user reviews such as thumb and star ratings. We compare mechanisms where forum users rate individual posts and also mechanisms where the user is asked to perform a pairwise comparison and state which one is better.

The main metric used to evaluate a mechanism is the ranking accuracy vs the cost of reviews, where the cost is measured as the average number of reviews used per post. We show that for many reasonable probability models, there is no thumb (or star) based ranking mechanism that can produce approximately accurate rankings with bounded number of reviews per item.

On the other hand we provide a review mechanism based on pairwise comparisons which achieves approximate rankings with bounded cost. We have implemented a system, shoutvelocity, which is a twitter-like forum but items (i.e., tweets in Twitter) are rated by using comparisons. For each new item the user who posts the item is required to compare two previous entries.

This ensures that over a sequence of n posts, we get at least n comparisons requiring one review per item on average. Our mechanism uses this sequence of comparisons to obtain a ranking estimate. It ensures that every item is reviewed at least once and winning entries are reviewed more often to obtain better estimates of top items.