Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

Proceedings of the International Conference on Electronic Commerce – EC2009. 2009

Improving Product Review Search Experiences in General Search Engines.

Shen Huang, Dan Shen, Wei Feng, Catherine Baudin, Yongzheng Zhang, Shen Huang, Dan Shen, Wei Feng, Catherine Baudin, Yongzheng Zhang

In the Web 2.0 era, internet users contribute a large amount of online content. Product review is a good example. Since these phenomena are distributed all over shopping sites, weblogs, forums etc., most people have to rely on general search engines to discover and digest others' comments. While conventional search engines work well in many situations, it's not sufficient for users to gather such information.

The reasons include but are not limited to: 1) the ranking strategy does not incorporate product reviews' inherent characteristics, e.g., sentiment orientation; 2) the snippets are neither indicative nor descriptive of user opinions. In this paper, we propose a feasible solution to enhance the experience of product review search.

Based on this approach, a system named "Improved Product Review Search (IPRS)" is implemented on the ground of a general search engine. Given a query on a product, our system is capable of: 1) automatically identifying user opinion segments in a whole article; 2) ranking opinions by incorporating both the sentiment orientation and the topics expressed in reviews; 3) generating readable review snippets to indicate user sentiment orientations; 4) easily comparing products based on a visualization of opinions.

Both results of a usability study and an automatic evaluation show that our system is able to assist users quickly understand the product reviews within limited time.

Keywords
Neural Information Processing Systems (NIPS), 2009

Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum

Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC.

However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL).

Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data.

Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain.

Keywords
PVLDB 2009

Randomized Multi-pass Streaming Skyline Algorithms

Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Jim Xu

We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average.

Due to the nature of disks, it is desirable that such algorithms access the input as a stream (even if in multiple passes). Using the tools of randomness, proved to be useful in many applications, we present an efficient multi-pass streaming algorithm, RAND, for skyline computation. As far as we are aware, RAND is the first randomized skyline algorithm in the literature.

RAND is near-optimal for the streaming model, which we prove via a simple lower bound. Additionally, our algorithm is distributable and can handle partially ordered domains on each attribute.

Finally, we demonstrate the robustness of RAND via extensive experiments on both real and synthetic datasets. RAND is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variation.

Keywords
Proceedings of the Nineteenth IEEE Int’l Workshop on Machine Learning for Signal Processing, Grenoble, France. September 2009

Classifying non-Gaussian and Mixed Data Sets in their Natural Parameter Space

Cécile Levasseur, Uwe Mayer, Ken Kreutz-Delgado

We consider the problem of both supervised and unsupervised classification for multidimensional data that are nongaussian and of mixed types (continuous and/or discrete). An important subclass of graphical model techniques called Generalized Linear Statistics (GLS) is used to capture the underlying statistical structure of these complex data.

GLS exploits the properties of exponential family distributions, which are assumed to describe the data components, and constrains latent variables to a lower dimensional parameter subspace.

Based on the latent variable information, classification is performed in the natural parameter subspace with classical statistical techniques. The benefits of decision making in parameter space is illustrated with examples of categorical data text categorization and mixed-type data classification.

As a text document preprocessing tool, an extension from binary to categorical data of the conditional mutual information maximization based feature selection algorithm is presented.

NIPS 2008 Workshop on Probabilistic Programming, Dec 13, Whistler, Canada

FAC- TORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning

Andrew McCallum, Khashayar Rohanimanesh, Michael Wick, Karl Schultz, Sameer Singh

No information

Keywords
CMPSCI Technical Report, UM-CS-2008-040, University of Massachusetts, December 2008

Reinforcement Learning for MAP Inference in Large Factor Graphs

Khashayar Rohanimanesh, Michael Wick, Sameer Singh, Andrew McCallum

Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC.

However, because of limitations in the design and parameterization of the jump function, these sampling-based methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution.

This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL) to model delayed reward with a log-linear function approximation of residual future score improvement.

Our method provides dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain.

Keywords

Pages