Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

NIPS 2003

Sparse greedy minimax probability machine classification

The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracy bound ­.

The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance.

In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time – greedily selecting the next one that maximizes the accuracy bound ­. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation.

Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound ­. Experimental results indicate that we can obtain reliable bounds ­, as well as test set accuracies that are comparable to state of the art classification algorithms.

Keywords
International Conference on Machine Learning (ICML), August 2003

Fast query-optimized kernel machine classification via incremental approximate nearest support vectors

Dennis DeCoste, D. Mazzoni, Dennis DeCoste, D. Mazzoni

Support vector machines (and other ker-nel machines) offer robust modern machine learning methods for nonlinear classification. However, relative to other alternatives (such as linear methods, decision trees and neu-ral networks), they can be orders of mag-nitude slower at query-time.

Unlike exist-ing methods that attempt to speedup query-time, such as reduced set compression (e.g. (Burges, 1996)) and anytime bounding (e.g. (DeCoste, 2002), we propose a new and ef-ficient approach based on treating the ker-nel machine classifier as a special form of k nearest-neighbor.

Our approach improves upon a traditional k-NN by determining at query-time a good k for each query, based on pre-query analysis guided by the origi-nal robust kernel machine. We demonstrate effectiveness on high-dimensional benchmark MNIST data, observing a greater than 100-fold reduction in the number of SVs required per query (amortized over all 45 pairwise MNIST digit classifiers), with no extra test errors (in fact, it happens to make 4 fewer)

Keywords
SIAMDM03, May 2003

Anytime query-tuned kernel machines via Cholesky factorization. Proceedings of SIAM International Conference on Data Mining

Dennis DeCoste, Dennis DeCoste

Kernel machines (including support vector machines) offer powerful new methods for improving the accuracy and robustness of fundamental data mining operations on challenging (e.g. high-dimensional) data, including classification, regression, dimensionality reduction, and outlier detection.

However, a key tradeoff to this power is that kernel machines typically compute their outputs in terms of a large fraction of the training data, making it difficult to scale them up to train and run over massive data sets typically tackled in data mining contexts.

We recently demonstrated 2 to 64-fold querytime speedups of SVM and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds [4]. This new paper refines our approach in two key ways.

First, we introduce a simple linear algebra formulation based on standard Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on three benchmark datasets.

Keywords
WProceedings of KDD-2003. pp. 717-722. Washington, DC. 2003

Experimental Design for Solicitation Campaigns

Uwe Mayer, Armand Sarkissian

Data mining techniques are routinely used by fundraisers to select those prospects from a large pool of candidates who are most likely to make a financial contribution. These techniques often rely on statistical models based on trial performance data.

This trial performance data is typically obtained by soliciting a smaller sample of the possible prospect pool. Collecting this trial data involves a cost; therefore the fundraiser is interested in keeping the trial size small while still collecting enough data to build a reliable statistical model that will be used to evaluate the remain-der of the prospects.

We describe an experimental design approach to optimally choose the trial prospects from an existing large pool of prospects. Pros-pects are clustered to render the problem practically tractable. We modify the standard D-optimality algorithm to prevent repeated selection of the same prospect cluster, since each prospect can only be solicited at most once. We assess the benefits of this approach on the KDD-98 data set by comparing the performance of the model based on the optimal trial data set with that of a model based on a randomly selected trial data set of equal size.

IEEE Transactions on Neural Networks, 2003

An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels

In this paper, we give an efficient method for computing the leave-one-out (LOO) error for support vector machines (SVMs) with Gaussian kernels quite accurately. It is particularly suitable for iterative decomposition methods of solving SVMs.

The importance of various steps of the method is illustrated in detail by showing the performance on six benchmark datasets. The new method often leads to speedups of 10-50 times compared to standard LOO error computation. It has good promise for use in hyperparameter tuning and model comparison.

Keywords
The Six-teenth Annual Conference on Neural Information Processing Systems (NIPS02), Vancouver, Canada,December 2002

Learning to Take Concurrent Actions

Khashayar Rohanimanesh, Sridhar Mahadevan

We investigate a general semi-Markov Decision Process (SMDP) framework for modeling concurrent decision making, where agents learn optimal plans over concurrent temporally extended actions.

Keywords
Proceedings of the Sixth International Conference on Computers and Information Technology. Dec. 2002

Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems

Badrul Sarwar, George Karypis, Joseph Konstan, John Riedl

No information

Keywords
International Conference on Machine Learning (ICML), July 2002

Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry

Dennis DeCoste

Classifying M query examples using a support vector machine containing L support vectors traditionally requires exactly M * L kernel computations.

We introduce a computational geometry method for which classification cost becomes roughly proportional to each query's difficulty (e.g. distance from the discriminant hyperplane). It produces exactly the same classifications, while typically requiring vastly fewer kernel computations.

Keywords
Machine Learning Journal (MLJ), Volume 46(1-3), 2002

Training invariant support vector machines

Dennis DeCoste, B. Schoelkopf

Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure.

We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits.

One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.

Keywords

Pages