We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.


The Twenty-First International Conference on Machine Learning (ICML04), Banf, Canada, July 4-8, 2004

Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen, Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen

In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist.

We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices.

Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linearchain CRFs, achieving comparable performance using only half the training data.

NIPS workshop on Syntax, Semantics, and Statistics, Vancouver, Canada, December 2003

Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences

Andrew McCallum, Khashayar Rohanimanesh, Charles Sutton, Andrew McCallum, Khashayar Rohanimanesh, Charles Sutton

Conditional random fields (CRFs) for sequence modeling have several advantages over joint models such as HMMs, including the ability to relax strong independence assumptions made in those models, and the ability to incorporate arbitrary overlapping features. Previous work has focused on linear-chain CRFs, which correspond to finite-statemachines, and have efficient exact inference algorithms.

Often, however, we wish to label sequence data in multiple interacting ways—for example, performing part-of-speech tagging and noun phrase segmentation simultaneously, increasing joint accuracy by sharing information between them.

We present dynamic conditional randomfields (DCRFs), which are CRFs in which each time slice has a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks—and parameters are tied across slices. (They could also be called conditionallytrained Dynamic Markov Networks.) Since exact inference can be intractable in these models, we perform approximate inference using the tree-based reparameterization framework (TRP). We also present empirical results comparing DCRFs with linear-chain CRFs on natural language data.

NIPS 2003

Sparse greedy minimax probability machine classification

The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracy bound ­.

The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance.

In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time – greedily selecting the next one that maximizes the accuracy bound ­. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation.

Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound ­. Experimental results indicate that we can obtain reliable bounds ­, as well as test set accuracies that are comparable to state of the art classification algorithms.

International Conference on Machine Learning (ICML), August 2003

Fast query-optimized kernel machine classification via incremental approximate nearest support vectors

Dennis DeCoste, D. Mazzoni, Dennis DeCoste, D. Mazzoni

Support vector machines (and other ker-nel machines) offer robust modern machine learning methods for nonlinear classification. However, relative to other alternatives (such as linear methods, decision trees and neu-ral networks), they can be orders of mag-nitude slower at query-time.

Unlike exist-ing methods that attempt to speedup query-time, such as reduced set compression (e.g. (Burges, 1996)) and anytime bounding (e.g. (DeCoste, 2002), we propose a new and ef-ficient approach based on treating the ker-nel machine classifier as a special form of k nearest-neighbor.

Our approach improves upon a traditional k-NN by determining at query-time a good k for each query, based on pre-query analysis guided by the origi-nal robust kernel machine. We demonstrate effectiveness on high-dimensional benchmark MNIST data, observing a greater than 100-fold reduction in the number of SVs required per query (amortized over all 45 pairwise MNIST digit classifiers), with no extra test errors (in fact, it happens to make 4 fewer)

SIAMDM03, May 2003

Anytime query-tuned kernel machines via Cholesky factorization. Proceedings of SIAM International Conference on Data Mining

Dennis DeCoste, Dennis DeCoste

Kernel machines (including support vector machines) offer powerful new methods for improving the accuracy and robustness of fundamental data mining operations on challenging (e.g. high-dimensional) data, including classification, regression, dimensionality reduction, and outlier detection.

However, a key tradeoff to this power is that kernel machines typically compute their outputs in terms of a large fraction of the training data, making it difficult to scale them up to train and run over massive data sets typically tackled in data mining contexts.

We recently demonstrated 2 to 64-fold querytime speedups of SVM and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds [4]. This new paper refines our approach in two key ways.

First, we introduce a simple linear algebra formulation based on standard Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on three benchmark datasets.

WProceedings of KDD-2003. pp. 717-722. Washington, DC. 2003

Experimental Design for Solicitation Campaigns

Uwe Mayer, Armand Sarkissian

Data mining techniques are routinely used by fundraisers to select those prospects from a large pool of candidates who are most likely to make a financial contribution. These techniques often rely on statistical models based on trial performance data.

This trial performance data is typically obtained by soliciting a smaller sample of the possible prospect pool. Collecting this trial data involves a cost; therefore the fundraiser is interested in keeping the trial size small while still collecting enough data to build a reliable statistical model that will be used to evaluate the remain-der of the prospects.

We describe an experimental design approach to optimally choose the trial prospects from an existing large pool of prospects. Pros-pects are clustered to render the problem practically tractable. We modify the standard D-optimality algorithm to prevent repeated selection of the same prospect cluster, since each prospect can only be solicited at most once. We assess the benefits of this approach on the KDD-98 data set by comparing the performance of the model based on the optimal trial data set with that of a model based on a randomly selected trial data set of equal size.

IEEE Transactions on Neural Networks, 2003

An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels

In this paper, we give an efficient method for computing the leave-one-out (LOO) error for support vector machines (SVMs) with Gaussian kernels quite accurately. It is particularly suitable for iterative decomposition methods of solving SVMs.

The importance of various steps of the method is illustrated in detail by showing the performance on six benchmark datasets. The new method often leads to speedups of 10-50 times compared to standard LOO error computation. It has good promise for use in hyperparameter tuning and model comparison.

The Six-teenth Annual Conference on Neural Information Processing Systems (NIPS02), Vancouver, Canada,December 2002

Learning to Take Concurrent Actions

Khashayar Rohanimanesh, Sridhar Mahadevan

We investigate a general semi-Markov Decision Process (SMDP) framework for modeling concurrent decision making, where agents learn optimal plans over concurrent temporally extended actions.

Proceedings of the Sixth International Conference on Computers and Information Technology. Dec. 2002

Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems

Badrul Sarwar, George Karypis, Joseph Konstan, John Riedl

No information