Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

Information Economics and Policy, Volume 24, Issue 1, Pages 3–14, March 2012 (and also NBER Working Paper #16507)

Supply Responses to Digital Distribution: Recorded Music and Live Performances

Chris Nosko, Julie Mortimer, Alan Sorensen

No Information

Keywords
Design, Guidelines, and Requirements, Personal and Ubiquitous Computing, 2012

Live Mobile Collaboration for Video Production

Marco deSa, David Shamma, Elizabeth Churchill

Traditional cameras and video equipment are gradually losing the race with smart-phones and small mobile devices that allow video, photo and audio capturing on the go. Users are now quickly creating movies and taking photos whenever and wherever they go, particularly at concerts and events.

Still, in-situ media capturing with such devices poses constraints to any user, especially amateur ones. In this paper, we present the design and evaluation of a mobile video capture suite that allows for cooperative ad-hoc production. Our system relies on ad-hoc in-situ collaboration offering users the ability to switch between streams and cooperate with each other in order to capture better media with mobile devices.

Our main contribution arises from the description of our design process focusing on the prototyping approach and the qualitative analysis that followed. Furthermore, we contribute with lessons and design guidelines that emerged and apply to in-situ design of rich video collaborative experiences and with the elicitation of functional and usability requirements for collaborative video production using mobile devices.

Keywords
WSDM 2012

Online selection of diverse results

Debmalya Panigrahi, Atish Das Sarma, Gagan Aggarwal, Andrew Tomkins

The phenomenal growth in the volume of easily accessible information via various web-based services has made it essential for service providers to provide users with personalized representative summaries of such information.

Further, online commercial services including social networking and micro-blogging websites, e-commerce portals, leisure and entertainment websites, etc. recommend interesting content to users that is simultaneously diverse on many different axes such as topic, geographic specificity, etc.

The key algorithmic question in all these applications is the generation of a succinct, representative, and relevant summary from a large stream of data coming from a variety of sources. In this paper, we formally model this optimization problem, identify its key structural characteristics, and use these observations to design an extremely scalable and efficient algorithm.

We analyze the algorithm using theoretical techniques to show that it always produces a nearly optimal solution. In addition, we perform large-scale experiments on both real-world and synthetically generated datasets, which confirm that our algorithm performs even better than its analytical guarantees in practice, and also outperforms other candidate algorithms for the problem by a wide margin.

Keywords
Second International ICST Conference, AMMA 2011. Vol. 80, Lecture Notes of the Institute for Computer Science, Social Informatics and Telecommunications Engineering. Springer, 2012. (Revised Selected Papers.)

Auctions, Market Mechanisms, and Their Applications

Sanmay Das, Sebastien Lahaie, Boleslaw Szymanski

No Information

Keywords
Journal of the American Society for Information Science and Technology (JASIST), 2012

Automatic Identification of Personal Insults on Social News Sites

Sara OwsleySood, Elizabeth Churchill, Judd Antin

As online communities grow and the volume of user-generated content increases, the need for community management also rises. Community management has three main purposes: to create a positive experience for existing participants, to promote appropriate, socionormative behaviors,

and to encourage potential participants to make contributions. Research indicates that the quality of content a potential participant sees on a site is highly influential; off-topic, negative comments with malicious intent are a particularly strong boundary to participation or set the tone for encouraging similar contributions. A problem for community managers, therefore, is the detection and elimination of such undesirable content. As a community grows, this undertaking becomes more daunting. Can an automated system aid community managers in this task?

In this paper, we address this question through a machine learning approach to automatic detection of inappropriate negative user contributions. Our training corpus is a set of comments from a news commenting site that we tasked Amazon Mechanical Turk workers with labeling. Each comment is labeled for the presence of profanity, insults, and the object of the insults. Support vector machines trained on these data are combined with relevance and valence analysis systems in a multistep approach to the detection of inappropriate negative user contributions.

The system shows great potential for semiautomated community management.

Keywords
WWW (Companion Volume) 2012: 73-82

Rewriting null e-commerce queries to recommend products

Nish Parikh, Neel Sundaresan

In e-commerce applications product descriptions are often concise. E-Commerce search engines often have to deal with queries that cannot be easily matched to product inventory resulting in zero recall or null query situations.

Null queries arise from differences in buyer and seller vocabulary or from the transient nature of products. In this paper, we describe a system that rewrites null e-commerce queries to find matching products as close to the original query as possible.

The system uses query relaxation to rewrite null queries in order to match products. Using eBay as an example of a dynamic marketplace, we show how using temporal feedback that respects product category structure using the repository of expired products, we improve the quality of recommended results.

The system is scalable and can be run in a high volume setting. We show through our experiments that high quality product recommendations for more than 25% of null queries are achievable.

Keywords

Bayes-Nash Equilibria of the Generalized Second-Price Auction

Renato Gomes

We develop a Bayes–Nash analysis of the generalized second-price (GSP) auction, the multi-unit auction used by search engines to sell sponsored advertising positions. Our main result characterizes the efficient Bayes–Nash equilibrium of the GSP and provides a necessary

and sufficient condition that guarantees existence of such an equilibrium. With only two positions, this condition requires that the click–through rate of the second position is sufficiently smaller than that of the first.

When an efficient equilibrium exists, we provide a necessary and sufficient condition for the auction revenue to decrease as click–through rates increase. Interestingly, under optimal reserve prices, revenue increases with the click–through rates of all positions. Further, we prove that no inefficient equilibrium of the GSP can be symmetric.

Our results are in sharp contrast with the previous literature that studied the GSP under complete information.

Keywords
WWW 2012

Your two weeks of fame and your grandmother’s

James Cook, Atish Das Sarma, Alex Fabrikant, Andrew Tomkins

Did celebrity last longer in 1929, 1992 or 2009? We investigate the phenomenon of fame by mining a collection of news articles that spans the twentieth century, and also perform a side study on a collection of blog posts from the last 10 years.

By analyzing mentions of personal names, we measure each person's time in the spotlight, and watch the distribution change from a century ago to a year ago. We expected to find a trend of decreasing durations of fame as news cycles accelerated and attention spans became shorter.

Instead, we find a remarkable consistency through most of the period we study. Through a century of rapid technological and societal change, through the appearance of Twitter, communication satellites and the Internet, we do not observe a significant change in typical duration of celebrity.

We also study the most famous of the famous, and find different results depending on our method for measuring duration of fame. With a method that may be thought of as measuring a spike of attention around a single narrow news story, we see the same result as before:

stories last as long now as they did in 1930. A second method, which may be thought of as measuring the duration of public interest in a person, indicates that famous people's presence in the news is becoming longer rather than shorter, an effect most likely driven by the wider distribution and higher volume of media in modern times.

Similar studies have been done with much shorter timescales specifically in the context of information spreading on Twitter and similar social networking site. However, to the best of our knowledge, this is the first massive scale study of this nature that spans over a century of archived data, thereby allowing us to track changes across decades.

Keywords

Pages