Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

Design, Guidelines, and Requirements, Personal and Ubiquitous Computing, 2012

Live Mobile Collaboration for Video Production

Marco deSa, David Shamma, Elizabeth Churchill

Traditional cameras and video equipment are gradually losing the race with smart-phones and small mobile devices that allow video, photo and audio capturing on the go. Users are now quickly creating movies and taking photos whenever and wherever they go, particularly at concerts and events.

Still, in-situ media capturing with such devices poses constraints to any user, especially amateur ones. In this paper, we present the design and evaluation of a mobile video capture suite that allows for cooperative ad-hoc production. Our system relies on ad-hoc in-situ collaboration offering users the ability to switch between streams and cooperate with each other in order to capture better media with mobile devices.

Our main contribution arises from the description of our design process focusing on the prototyping approach and the qualitative analysis that followed. Furthermore, we contribute with lessons and design guidelines that emerged and apply to in-situ design of rich video collaborative experiences and with the elicitation of functional and usability requirements for collaborative video production using mobile devices.

Keywords
in IEEE Large-scale Data Analysis and Visualization (LDAV) 2012

Visual Analysis of Massive Web Session Data

Zeqian Shen, Jishang Wei, Neel Sundaresan, Kwan-Liu Ma, Zeqian Shen, Jishang Wei, Neel Sundaresan, Kwan-Liu Ma

Tracking and recording users’ browsing behaviors on the web down to individual mouse clicks can create massive web session logs.While such web session data contain valuable information about user behaviors, the ever-increasing data size has placed a big challenge to analyzing and visualizing the data.

An efficient data analysis framework requires both powerful computational analysis and interactive visualization. Following the visual analytics mantra "Analyze first, show the important, zoom, filter and analyze further, details on demand", we introduce a two-tier visual analysis system, TrailExplorer2, to discover knowledge from massive log data.

The system supports a visual analysis process iterating between two steps: querying web sessions and visually analyzing the retrieved data. The query happens at the lower tier where terabytes of web session data are processed in a cluster.

At the upper tier, the extracted web sessions with much smaller scale are visualized on a personal computer for interactive exploration. Our system visualizes a sorted list of web sessions’ temporal patterns and enables data exploration at different levels of details.

The query visualization exploration process iterates until a satisfactory conclusion is achieved. We present two case studies of TrailExplorer2 using real world session data from eBay to demonstrate the system's effectiveness.

Keywords
Categories
in IEEE Visual Analytics Science and Technology (VAST) 2012

Visual Cluster Exploration of Web Clickstream Data

Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma, Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma

Web clickstream data are routinely collected to study how users browse the web or use a service. It is clear that the ability to recognize and summarize user behavior patterns from such data is valuable to e-commerce companies. In this paper, we introduce a visual analytics system to explore the various user behavior patterns reflected by distinct clickstream clusters.

In a practical analysis scenario, the system first presents an overview of clickstream clusters using a Self-Organizing Map with Markov chain models.

Then the analyst can interactively explore the clusters through an intuitive user interface. He can either obtain summarization of a selected group of data or further refine the clustering result. We evaluated our system using two different datasets from eBay.

Analysts who were working on the same data have confirmed the system’s effectiveness in extracting user behavior patterns from complex datasets and enhancing their ability to reason.

Keywords
Categories
Proceedings of KDD’12, Beijing, China. August 2012

Bootstrapped Language Identification For Multi-Site Internet Domains

We present an algorithm for language identification, in particular of short documents, for the case of an Internet domain with sites in multiple countries with differing languages.

The algorithm is significantly faster than standard language identification methods, while providing state-of-the-art identification. We bootstrap the algorithm based on the language identification based on the site alone, a methodology suitable for any supervised language identification algorithm.

We demonstrate the bootstrapping and algorithm on eBay email data and on Twitter status updates data. The algorithm is deployed at eBay as part of the back-office development data repository.

STOC 2011 (Invited and accepted to SICOMP)

Distributed Verification and Hardness of Distributed Approximation

Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, Roger Wattenhofer, Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, Roger Wattenhofer

We study the verification problem in distributed networks, stated as follows. Let H be a subgraph of a network G where each vertex of G knows which edges incident on it are in H. We would like to verify whether H has some properties, e.g., if it is a tree or if it is connected (every node knows in the end of the process whether H has the specified property or not). We would like to perform this verification in a decentralized fashion via a distributed algorithm. The time complexity of verification is measured as the number of rounds of distributed communication.

In this paper we initiate a systematic study of distributed verification, and give almost tight lower bounds on the running time of distributed verification algorithms for many fundamental problems such as connectivity, spanning connected subgraph, and s-t cut verification.

We then show applications of these results in deriving strong unconditional time lower bounds on the hardness of distributed approximation for many classical optimization problems including minimum spanning tree, shortest paths, and minimum cut.

Many of these results are the first non-trivial lower bounds for both exact and approximate distributed computation and they resolve previous open questions. Moreover, our unconditional lower bound of approximating minimum spanning tree (MST) subsumes and improves upon the previous hardness of approximation bound of Elkin [STOC 2004] as well as the lower bound for (exact) MST computation of Peleg and Rubinovich [FOCS 1999]. Our result implies that there can be no distributed approximation algorithm for MST that is significantly faster than the current exact algorithm, for any approximation factor.

Our lower bound proofs show an interesting connection between communication complexity and distributed computing which turns out to be useful in establishing the time complexity of exact and approximate distributed computation of many problems.

Keywords
PVLDB 2011 (Invited to VLDB Journal Special Issue)

Personalized Social Recommendations – Accurate or Private?

Atish Das Sarma, Ashwin Machanavajjhala, Aleksandra Korolova

With the recent surge of social networks such as Facebook, new forms of recommendations have become possible -- recommendations that rely on one's social connections in order to make personalized recommendations of ads, content, products, and people. Since recommendations may use sensitive information, it is speculated that these recommendations are associated with privacy risks. The main contribution of this work is in formalizing trade-offs between accuracy and privacy of personalized social recommendations.

We study whether "social recommendations", or recommendations that are solely based on a user's social network, can be made without disclosing sensitive links in the social graph. More precisely, we quantify the loss in utility when existing recommendation algorithms are modified to satisfy a strong notion of privacy, called differential privacy. We prove lower bounds on the minimum loss in utility for any recommendation algorithm that is differentially private.

We then adapt two privacy preserving algorithms from the differential privacy literature to the problem of social recommendations, and analyze their performance in comparison to our lower bounds, both analytically and experimentally.

We show that good private social recommendations are feasible only for a small subset of the users in the social network or for a lenient setting of privacy parameters.

Keywords
PODC 2011

A tight unconditional lower bound on distributed random walk computation

Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan

No Information

Keywords
Volume 18, Issue 1, 2011

Rat, rational, or seething cauldron of desire: designing the shopper, interactions

Elizabeth Churchill

In this article, I look at how our shopping experience, both online and offline, is designed.

Keywords
in Proceedings of the fourth ACM international conference on Web search and data mining (WSDM), 2011.

eBay: an E-commerce Marketplace as a Complex Network

Zeqian Shen, Neel Sundaresan, Zeqian Shen, Neel Sundaresan

Commerce networks involve buying and selling activities among individuals or organizations. As the growing of the Internet and e-commerce, it brings opportunities for obtaining real world online commerce networks, which are magnitude larger than before.

Getting a deeper understanding of e-commerce networks, such as the eBay marketplace, in terms of what structure they have, what kind of interactions they afford, what trust and reputation measures exist, and how they evolve has tremendous value in suggesting business opportunities and building effective user applications.

In this paper, we modeled the eBay network as a complex network. We analyzed the macroscopic shape of the network using degree distribution and the bow-tie model. Networks of different eBay categories are also compared.

The results suggest that the categories vary from collector networks to retail networks. We also studied the local structures of the networks using motif profiling. Finally, patterns of preferential connections are visually analyzed using Auroral diagrams.

Keywords

Pages