We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.


Journal of the American Society for Information Science and Technology (JASIST), 2012

Automatic Identification of Personal Insults on Social News Sites

Sara OwsleySood, Elizabeth Churchill, Judd Antin

As online communities grow and the volume of user-generated content increases, the need for community management also rises. Community management has three main purposes: to create a positive experience for existing participants, to promote appropriate, socionormative behaviors,

and to encourage potential participants to make contributions. Research indicates that the quality of content a potential participant sees on a site is highly influential; off-topic, negative comments with malicious intent are a particularly strong boundary to participation or set the tone for encouraging similar contributions. A problem for community managers, therefore, is the detection and elimination of such undesirable content. As a community grows, this undertaking becomes more daunting. Can an automated system aid community managers in this task?

In this paper, we address this question through a machine learning approach to automatic detection of inappropriate negative user contributions. Our training corpus is a set of comments from a news commenting site that we tasked Amazon Mechanical Turk workers with labeling. Each comment is labeled for the presence of profanity, insults, and the object of the insults. Support vector machines trained on these data are combined with relevance and valence analysis systems in a multistep approach to the detection of inappropriate negative user contributions.

The system shows great potential for semiautomated community management.

WSDM 2012

Online selection of diverse results

Debmalya Panigrahi, Atish Das Sarma, Gagan Aggarwal, Andrew Tomkins

The phenomenal growth in the volume of easily accessible information via various web-based services has made it essential for service providers to provide users with personalized representative summaries of such information.

Further, online commercial services including social networking and micro-blogging websites, e-commerce portals, leisure and entertainment websites, etc. recommend interesting content to users that is simultaneously diverse on many different axes such as topic, geographic specificity, etc.

The key algorithmic question in all these applications is the generation of a succinct, representative, and relevant summary from a large stream of data coming from a variety of sources. In this paper, we formally model this optimization problem, identify its key structural characteristics, and use these observations to design an extremely scalable and efficient algorithm.

We analyze the algorithm using theoretical techniques to show that it always produces a nearly optimal solution. In addition, we perform large-scale experiments on both real-world and synthetically generated datasets, which confirm that our algorithm performs even better than its analytical guarantees in practice, and also outperforms other candidate algorithms for the problem by a wide margin.


Interactive regret minimization

Danupon Nanongkai, Ashwin Lall, Atish Das Sarma, Kazuhisa Makino

We study the notion of regret ratio proposed in [19] Nanongkai et al. [VLDB10] to deal with multi-criteria decision making in database systems. The regret minimization query proposed in [19] Nanongkai et al. was shown to have features of both skyline and top-k:

it does not need information from the user but still controls the output size. While this approach is suitable for obtaining a reasonably small regret ratio, it is still open whether one can make the regret ratio arbitrarily small. Moreover, it remains open whether reasonable questions can be asked to the users in order to improve efficiency of the process.

In this paper, we study the problem of minimizing regret ratio when the system is enhanced with interaction. We assume that when presented with a set of tuples the user can tell which tuple is most preferred.

Under this assumption, we develop the problem of interactive regret minimization where we fix the number of questions and tuples per question that we can display, and aim at minimizing the regret ratio. We try to answer two questions in this paper:

(1) How much does interaction help? That is, how much can we improve the regret ratio when there are interactions?

(2) How efficient can interaction be? In particular, we measure how many questions we have to ask the user in order to make her regret ratio small enough.

We answer both questions from both theoretical and practical standpoints. For the first question, we show that interaction can reduce the regret ratio almost exponentially. To do this, we prove a lower bound for the previous approach (thereby resolving an open problem from [19] Nanongkai et al.), and develop an almost-optimal upper bound that makes the regret ratio exponentially smaller.

Our experiments also confirm that, in practice, interactions help in improving the regret ratio by many orders of magnitude. For the second question, we prove that when our algorithm shows a reasonable number of points per question, it only needs a few questions to make the regret ratio small.

Thus, interactive regret minimization seems to be a necessary and sufficient way to deal with multi-criteria decision making in database systems.

Second International ICST Conference, AMMA 2011. Vol. 80, Lecture Notes of the Institute for Computer Science, Social Informatics and Telecommunications Engineering. Springer, 2012. (Revised Selected Papers.)

Auctions, Market Mechanisms, and Their Applications

Sanmay Das, Sebastien Lahaie, Boleslaw Szymanski

No Information

Design, Guidelines, and Requirements, Personal and Ubiquitous Computing, 2012

Live Mobile Collaboration for Video Production

Marco deSa, David Shamma, Elizabeth Churchill

Traditional cameras and video equipment are gradually losing the race with smart-phones and small mobile devices that allow video, photo and audio capturing on the go. Users are now quickly creating movies and taking photos whenever and wherever they go, particularly at concerts and events.

Still, in-situ media capturing with such devices poses constraints to any user, especially amateur ones. In this paper, we present the design and evaluation of a mobile video capture suite that allows for cooperative ad-hoc production. Our system relies on ad-hoc in-situ collaboration offering users the ability to switch between streams and cooperate with each other in order to capture better media with mobile devices.

Our main contribution arises from the description of our design process focusing on the prototyping approach and the qualitative analysis that followed. Furthermore, we contribute with lessons and design guidelines that emerged and apply to in-situ design of rich video collaborative experiences and with the elicitation of functional and usability requirements for collaborative video production using mobile devices.


Bayes-Nash Equilibria of the Generalized Second-Price Auction

Renato Gomes

We develop a Bayes–Nash analysis of the generalized second-price (GSP) auction, the multi-unit auction used by search engines to sell sponsored advertising positions. Our main result characterizes the efficient Bayes–Nash equilibrium of the GSP and provides a necessary

and sufficient condition that guarantees existence of such an equilibrium. With only two positions, this condition requires that the click–through rate of the second position is sufficiently smaller than that of the first.

When an efficient equilibrium exists, we provide a necessary and sufficient condition for the auction revenue to decrease as click–through rates increase. Interestingly, under optimal reserve prices, revenue increases with the click–through rates of all positions. Further, we prove that no inefficient equilibrium of the GSP can be symmetric.

Our results are in sharp contrast with the previous literature that studied the GSP under complete information.

WWW 2012

Your two weeks of fame and your grandmother’s

James Cook, Atish Das Sarma, Alex Fabrikant, Andrew Tomkins

Did celebrity last longer in 1929, 1992 or 2009? We investigate the phenomenon of fame by mining a collection of news articles that spans the twentieth century, and also perform a side study on a collection of blog posts from the last 10 years.

By analyzing mentions of personal names, we measure each person's time in the spotlight, and watch the distribution change from a century ago to a year ago. We expected to find a trend of decreasing durations of fame as news cycles accelerated and attention spans became shorter.

Instead, we find a remarkable consistency through most of the period we study. Through a century of rapid technological and societal change, through the appearance of Twitter, communication satellites and the Internet, we do not observe a significant change in typical duration of celebrity.

We also study the most famous of the famous, and find different results depending on our method for measuring duration of fame. With a method that may be thought of as measuring a spike of attention around a single narrow news story, we see the same result as before:

stories last as long now as they did in 1930. A second method, which may be thought of as measuring the duration of public interest in a person, indicates that famous people's presence in the news is becoming longer rather than shorter, an effect most likely driven by the wider distribution and higher volume of media in modern times.

Similar studies have been done with much shorter timescales specifically in the context of information spreading on Twitter and similar social networking site. However, to the best of our knowledge, this is the first massive scale study of this nature that spans over a century of archived data, thereby allowing us to track changes across decades.