Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

in IEEE Visual Analytics Science and Technology (VAST) 2012

Visual Cluster Exploration of Web Clickstream Data

Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma, Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma

Web clickstream data are routinely collected to study how users browse the web or use a service. It is clear that the ability to recognize and summarize user behavior patterns from such data is valuable to e-commerce companies. In this paper, we introduce a visual analytics system to explore the various user behavior patterns reflected by distinct clickstream clusters.

In a practical analysis scenario, the system first presents an overview of clickstream clusters using a Self-Organizing Map with Markov chain models.

Then the analyst can interactively explore the clusters through an intuitive user interface. He can either obtain summarization of a selected group of data or further refine the clustering result. We evaluated our system using two different datasets from eBay.

Analysts who were working on the same data have confirmed the system’s effectiveness in extracting user behavior patterns from complex datasets and enhancing their ability to reason.

Keywords
Categories
Proceedings of KDD’12, Beijing, China. August 2012

Bootstrapped Language Identification For Multi-Site Internet Domains

We present an algorithm for language identification, in particular of short documents, for the case of an Internet domain with sites in multiple countries with differing languages.

The algorithm is significantly faster than standard language identification methods, while providing state-of-the-art identification. We bootstrap the algorithm based on the language identification based on the site alone, a methodology suitable for any supervised language identification algorithm.

We demonstrate the bootstrapping and algorithm on eBay email data and on Twitter status updates data. The algorithm is deployed at eBay as part of the back-office development data repository.

STOC 2011 (Invited and accepted to SICOMP)

Distributed Verification and Hardness of Distributed Approximation

Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, Roger Wattenhofer, Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, Roger Wattenhofer

We study the verification problem in distributed networks, stated as follows. Let H be a subgraph of a network G where each vertex of G knows which edges incident on it are in H. We would like to verify whether H has some properties, e.g., if it is a tree or if it is connected (every node knows in the end of the process whether H has the specified property or not). We would like to perform this verification in a decentralized fashion via a distributed algorithm. The time complexity of verification is measured as the number of rounds of distributed communication.

In this paper we initiate a systematic study of distributed verification, and give almost tight lower bounds on the running time of distributed verification algorithms for many fundamental problems such as connectivity, spanning connected subgraph, and s-t cut verification.

We then show applications of these results in deriving strong unconditional time lower bounds on the hardness of distributed approximation for many classical optimization problems including minimum spanning tree, shortest paths, and minimum cut.

Many of these results are the first non-trivial lower bounds for both exact and approximate distributed computation and they resolve previous open questions. Moreover, our unconditional lower bound of approximating minimum spanning tree (MST) subsumes and improves upon the previous hardness of approximation bound of Elkin [STOC 2004] as well as the lower bound for (exact) MST computation of Peleg and Rubinovich [FOCS 1999]. Our result implies that there can be no distributed approximation algorithm for MST that is significantly faster than the current exact algorithm, for any approximation factor.

Our lower bound proofs show an interesting connection between communication complexity and distributed computing which turns out to be useful in establishing the time complexity of exact and approximate distributed computation of many problems.

Keywords
PVLDB 2011 (Invited to VLDB Journal Special Issue)

Personalized Social Recommendations – Accurate or Private?

Atish Das Sarma, Ashwin Machanavajjhala, Aleksandra Korolova

With the recent surge of social networks such as Facebook, new forms of recommendations have become possible -- recommendations that rely on one's social connections in order to make personalized recommendations of ads, content, products, and people. Since recommendations may use sensitive information, it is speculated that these recommendations are associated with privacy risks. The main contribution of this work is in formalizing trade-offs between accuracy and privacy of personalized social recommendations.

We study whether "social recommendations", or recommendations that are solely based on a user's social network, can be made without disclosing sensitive links in the social graph. More precisely, we quantify the loss in utility when existing recommendation algorithms are modified to satisfy a strong notion of privacy, called differential privacy. We prove lower bounds on the minimum loss in utility for any recommendation algorithm that is differentially private.

We then adapt two privacy preserving algorithms from the differential privacy literature to the problem of social recommendations, and analyze their performance in comparison to our lower bounds, both analytically and experimentally.

We show that good private social recommendations are feasible only for a small subset of the users in the social network or for a lenient setting of privacy parameters.

Keywords
PODC 2011

A tight unconditional lower bound on distributed random walk computation

Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan

No Information

Keywords
Volume 18, Issue 1, 2011

Rat, rational, or seething cauldron of desire: designing the shopper, interactions

Elizabeth Churchill

In this article, I look at how our shopping experience, both online and offline, is designed.

Keywords
in Proceedings of the fourth ACM international conference on Web search and data mining (WSDM), 2011.

eBay: an E-commerce Marketplace as a Complex Network

Zeqian Shen, Neel Sundaresan, Zeqian Shen, Neel Sundaresan

Commerce networks involve buying and selling activities among individuals or organizations. As the growing of the Internet and e-commerce, it brings opportunities for obtaining real world online commerce networks, which are magnitude larger than before.

Getting a deeper understanding of e-commerce networks, such as the eBay marketplace, in terms of what structure they have, what kind of interactions they afford, what trust and reputation measures exist, and how they evolve has tremendous value in suggesting business opportunities and building effective user applications.

In this paper, we modeled the eBay network as a complex network. We analyzed the macroscopic shape of the network using degree distribution and the bow-tie model. Networks of different eBay categories are also compared.

The results suggest that the categories vary from collector networks to retail networks. We also studied the local structures of the networks using motif profiling. Finally, patterns of preferential connections are visually analyzed using Auroral diagrams.

Keywords
SIGIR 2011: 75-84

User behavior in zero-recall ecommerce queries

Gyanit Singh, Nish Parikh, Neel Sundaresan

User expectation and experience for web search and eCommerce (product) search are quite different. Product descriptions are concise as compared to typical web documents. User expectation is more specific to find the right product.

The difference in the publisher and searcher vocabulary (in case of product search the seller and the buyer vocabulary) combined with the fact that there are fewer products to search over than web documents result in observable numbers of searches that return no results (zero recall searches).

In this paper we describe a study of zero recall searches. Our study is focused on eCommerce search and uses data from a leading eCommerce site's user click stream logs.

There are 3 main contributions of our study: 1) The cause of zero recall searches; 2) A study of user's reaction and recovery from zero recall; 3) A study of differences in behavior of power users versus novice users to zero recall searches.

Keywords
Interacting with Computers, 10/2011, Volume 23, Issue 5, p.iii-xi, 2011

Feminism and HCI: New Perspectives

Shaowen Bardzell, Elizabeth Churchill

As a word and as a set of theories and practices, feminism is a poorly understood concept. However, feminist perspectives have a lot in common with user- and value-centered design processes such as those espoused within the field of Human Computer Interaction.

Examples include consideration of alternative viewpoints, considerations of agency (who get to say/do what and under what circumstances) and the development of reflective and reflexive methods for understanding how, when, where and why people do what they do.

In the ''Feminism and HCI: New Perspectives'' special issue, we have invited researchers and practitioners to reflect on the ways in which feminist thinking, theory, and practice can and does have an impact on the field of Human Computer Interaction.

This introductory editorial offers more background to our view that there is great value to understanding the actual and potential impact of feminist thinking on HCI, followed by a precis of each paper. We close with some observations regarding common themes, points of contention and possibilities for future work.

Keywords

Pages