Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

NIPS 2008 Workshop on Probabilistic Programming, Dec 13, Whistler, Canada

FAC- TORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning

Andrew McCallum, Khashayar Rohanimanesh, Michael Wick, Karl Schultz, Sameer Singh

No information

Keywords
CMPSCI Technical Report, UM-CS-2008-040, University of Massachusetts, December 2008

Reinforcement Learning for MAP Inference in Large Factor Graphs

Khashayar Rohanimanesh, Michael Wick, Sameer Singh, Andrew McCallum

Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC.

However, because of limitations in the design and parameterization of the jump function, these sampling-based methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution.

This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL) to model delayed reward with a log-linear function approximation of residual future score improvement.

Our method provides dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain.

Keywords
CMPSCI Technical Report, UM-CS-2009-008, University of Massachusetts, December 2008

Inference and Learning in Large Factor Graphs with Adaptive Proposal Distributions

Khashayar Rohanimanesh, Michael Wick, Andrew McCallum

Large templated factor graphs with complex structure that changes during inference have been shown to provide state-of-the-art experimental results in tasks such as identity uncertainty and information integration. However, inference and learning in these models is notoriously difficult.

This paper formalizes, analyzes and proves convergence for the SampleRank algorithm, which learns extremely efficiently by calculating approximate parameter estimation gradients from each proposed MCMC jump. Next we present a parameterized, adaptive proposal distribution, which greatly increases the number of accepted jumps.

We combine these methods in experiments on a real-world information extraction problem and demonstrate that the adaptive proposal distribution requires 27% fewer jumps than a more traditional proposer.

Keywords
DMAI 2008 11th IEEE International Conf on Comp Info Tech, Khulna, Bangladesh (December 2008)

A Unifying Viewpoint of some Clustering Techniques Using Bregman Divergences and Extensions to Mixed Data Sets

Cecile Levasseur, Brandon Burdge, Ken Kreutz-Delgado, Uwe Mayer

We present a general viewpoint using Bregman divergences and exponential family properties that contains as special cases the three following algorithms: 1) exponential family Principal Component Analysis (exponential PCA), 2) Semi-Parametric exponential family Principal Component Analysis (SP-PCA) and 3) Bregman soft clustering. This framework is equivalent to a mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We show that within this framework exponential PCA and SPPCA are similar to the Bregman soft clustering technique with the addition of a linear constraint in the parameter space. We implement the resulting modifications to SP-PCA and Bregman soft clustering for mixed (continuous and/or discrete) data sets, and add a nonparametric estimation of the point-mass probabilities to exponential PCA. Finally, we compare the relative performances of the three algorithms in a clustering setting for mixed data sets.

Submitted to the 14th ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining 2008

A Unified Approach for Schema Matching, Coreference and Canonicalization

Michael Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew McCallum

The automatic consolidation of database records from many heterogeneous sources into a single repository requires solving several information integration tasks. Although tasks such as coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation.

Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a discriminatively-trained model that reasons about schema matching, coreference, and canonicalization jointly.

We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or isolated approach.

Our experiments show that a joint model is able to improve substantially over systems that either solve each task in isolation or with the conventional cascade. We demonstrate nearly a 50% error reduction for coreference and a 40% error reduction for schema matching.

Keywords
Proceedings of the 1st International Workshop on Data Mining and Artificial Intelligence (DMAI 2008) held in conjunction with the 11th IEEE International Conference on Computer and Information Technology. Khulna, Bangladesh. December 2008

A Unifying Viewpoint of some Clustering Techniques Using Bregman Divergences and Extensions to Mixed Data Sets

Uwe Mayer, Cécile Levasseur, Brandon Burge, Ken Kreutz-Delgado, Uwe Mayer, Cécile Levasseur, Brandon Burge, Ken Kreutz-Delgado

We present a general viewpoint using Bregman divergences and exponential family properties that contains as special cases the three following algorithms: 1) exponential family Principal Component Analysis (exponential PCA), 2) Semi-Parametric exponential family Principal Component Analysis (SP-PCA) and 3) Bregman soft clustering.

This framework is equivalent to a mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We show that within this framework exponential PCA and SPPCA are similar to the Bregman soft clustering technique with the addition of a linear constraint in the parameter space.

We implement the resulting modifications to SP-PCA and Bregman soft clustering for mixed (continuous and/or discrete) data sets, and add a nonparametric estimation of the point-mass probabilities to exponential PCA. Finally, we compare the relative performances of the three algorithms in a clustering setting for mixed data sets.

Keywords
In Proceedings of IEEE Pacific Visualization Symposium, IEEE VGTC, March, 2008, pp.175-182

MobiVis: A Visualization System for Exploring Mobile Data

Zeqian Shen, Kwan-Liu Ma, Zeqian Shen, Kwan-Liu Ma

The widespread use of mobile devices brings opportunities to capture large-scale, continuous information about human behavior. Mobile data has tremendous value, leading to business opportunities, market strategies, security concerns, etc.

Visual analytics systems that support interactive exploration and discovery are needed to extracting insight from the data. However, visual analysis of complex social-spatial-temporal mobile data presents several challenges.

We have created MobiVis, a visual analytics tool, which incorporates the idea of presenting social and spatial information in one heterogeneous network. The system supports temporal and semantic filtering through an interactive time chart and ontology graph, respectively, such that data subsets of interest can be isolated for close-up investigation.

"Behavior rings," a compact radial representation of individual and group behaviors, is introduced to allow easy comparison of behavior patterns. We demonstrate the capability of MobiVis with the results obtained from analyzing the MIT Reality Mining dataset.

Keywords

Pages