We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.


Proceedings of the 2008 IAPR Workshop on Cognitive Information Processing. pp. 126-131. Santorini, Greece. 2008

Generalized Statistical Methods for Unsupervised Minority Class Detection in Mixed Data Sets

Cecile Levasseur, Uwe Mayer, Brandon Burdge, Ken Kreutz-Delgado

Minority class detection is the problem of detecting the occurrence of rare key events differing from the majority of a data set. This paper considers the problem of unsupervised minority class detection for multidimensional data that are highly nongaussian, mixed (continuous and/or discrete), noisy, and nonlinearly related, such as occurs, for example, in fraud detection in typical financial data.

A statistical modeling approach is proposed which is a subclass of graphical model techniques. It exploits the properties of exponential family distributions and generalizes techniques from classical linear statistics into a framework referred to as Generalized Linear Statistics (GLS). The methodology exploits the split between the data space and the parameter space for exponential family distributions and solves a nonlinear problem by using classical linear statistical tools applied to data that has been mapped into the parameter space.

A fraud detection technique utilizing low-dimensional information learned by using an Iteratively Reweighted Least Squares (IRLS) based approach to GLS is proposed in the parameter space for data of mixed type. ROC curves for an initial simulation on synthetic data are presented, which gives predictions for results on actual financial data sets.

Proceedings of the International Conference on Electronic Commerce ICEC2009. 2009

Discovering Clues for Review Quality from Author’s Behaviors on E-commerce Sites

Shen Huang, Dan Shen, Wei Feng, Yongzheng Zhang, Catherine Baudin, Shen Huang, Dan Shen, Wei Feng, Yongzheng Zhang, Catherine Baudin

With the number of online reviews growing rapidly, it is increasingly difficult to digest all the information within limited time. To help users efficiently get concise information about a product, researchers have studied algorithms for automated opinion summarization.

However, users might expect to further read detailed high-quality reviews in addition to a review outline. This raises another interesting problem not well studied yet: how to discover high quality product reviews? Previous research examined various properties of a product review to predict its quality.

In this paper, we further explore this topic by incorporating another information resource: the behavior of review authors in an e-commerce community. First, we perform a high-level analysis on two kinds of data: product reviews and deal transactions. According to the results of this analysis, three features, including personal reputation, seller degree and expertise degree, are studied to assess the quality of a review from a credibility and expertise perspective.

Our analysis shows that these features are strongly related to review quality and that they can help uncover review spamming by sellers. Furthermore, we propose a simulation model based on the above findings. The model is able to generate the basic properties of the review community, especially when the above three features are taken into account.

In IEEE VAST 2008 Symposium Challenge, 2008

Visual Analytics of Cell Phone Data using MobiVis and OntoVis

Carlos D.Correa, Tarik Crnovrsanin, Christopher Muelder, Zeqian Shen, Ryan Armstrong, James Shearer, Kwan-Liu Ma

MobiVis is a visual analytics tools to aid in the process of processing and understanding complex relational data, such as social networks. At the core of these tools is the ability to filter complex networks structurally and semantically, which helps us discover clusters and patterns in the organization of social networks.

Semantic filtering is obtained via an ontology graph, based on another visual analytics tool, called OntoVis. In this summary, we describe how these tools where used to analyze one of the mini-challenges of the 2008 VAST challenge.

KDD 2008

Bypass rates: reducing query abandonment using negative inferences

Atish Das Sarma, Sreenivas Gollapudi, Samuel Ieong, Atish Das Sarma, Sreenivas Gollapudi, Samuel Ieong

We introduce a new approach to analyzing click logs by examining both the documents that are clicked and those that are bypassed-documents returned higher in the ordering of the search results but skipped by the user.

This approach complements the popular click-through rate analysis, and helps to draw negative inferences in the click logs. We formulate a natural objective that finds sets of results that are unlikely to be collectively bypassed by a typical user.

This is closely related to the problem of reducing query abandonment. We analyze a greedy approach to optimizing this objective, and establish theoretical guarantees of its performance.

We evaluate our approach on a large set of queries, and demonstrate that it compares favorably to the maximal marginal relevance approach on a number of metrics including mean average precision and mean reciprocal rank.

Memory Studies, 09/2008, Volume 1, Issue 3, p.295-310, 2008

Technologies of Memory: Key Issues and Critical Perspectives

Nancy VanHouse, Elizabeth Churchill

Past, present and emerging technologies of memory are important concerns for memory studies. What is remembered individually and collectively depends in part on technologies of memory and socio-technical practices, which are changing radically. We identify specific concerns about developments in digital memory capture, storage and retrieval. Decisions are being made now that may have far-reaching consequences.

Systems are being designed based on models and metaphors in which human memory works much like the computer.

We bring to this discussion a critical perspective from science and technology studies (STS) and a grounding in human—computer interaction (HCI) and computer-supported cooperative work (CSCW).

We argue that, while these developments are significant for memory studies research, even more important is the need for memory studies to remind and inspire designers of what is possible and useful, and help expand the understanding of human memory on which these systems are based.

In Proceedings of Eurographics/IEEE VGTC Syposium on Visualization, May 2007, pp. 83-90

Path Visualization for Adjacency Matrices

Zeqian Shen, Kwan-Liu Ma, Zeqian Shen, Kwan-Liu Ma

For displaying a dense graph, an adjacency matrix is superior than a node-link diagram because it is more compact and free of visual clutter. A node-link diagram, however, is far better for the task of path finding because a path can be easily traced by following the corresponding links, provided that the links are not heavily crossed or tangled.

We augment adjacency matrices with path visualization and associated interaction techniques to facilitate path finding.

Our design is visually pleasing, and also effectively displays multiple paths based on the design commonly found in metro maps. We illustrate and assess the key aspects of our design with the results obtained from two case studies and an informal user study.

The Review of Agricultural Economics, 29(3):446-493. (2007)

The Productivity Argument for Investing in Young Children

James J.Heckman

This paper presents a productivity argument for investing in disadvantaged young children. For such investment, there is no equity-efficiency tradeoff.