Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

Proc. Of the Asia Pacific Symposium on Information Visualization (APVIS 06), 2006

BiblioViz: A System for Visualization Bibliography Information

Zeqian Shen, Michael Ogawa, Soon Tee Teoh, Kwan-Liu Ma, Zeqian Shen, Michael Ogawa, Soon Tee Teoh, Kwan-Liu Ma

The InfoVis 2004 contest led to the development of several bibliography visualization systems. Even though each of these systems o#ers some unique views of the bibliography data, there is no single best system o#ering all the desired views. We have thus studied how to consolidate the desirable functionalities of these systems into a cohesive design.

We have also designed a few novel visualization methods. This paper presents our findings and creation: BiblioViz, a bibliography visualization system that gives the maximum number of views of the data using a minimum number of visualization constructs in a unified fashion.

Keywords
Categories
IEEE Transactions on Visualization and Computer Graphics, 12, 6, (Nov 2006), pp. 1429-1439

Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction

Zeqian Shen, Kwan-Liu Ma, Tina Eliassi-Rad, Zeqian Shen, Kwan-Liu Ma, Tina Eliassi-Rad

Social network analysis is an active area of study beyond sociology. It uncovers the invisible relationships between actors in a network and provides understanding of social processes and behaviors.

It has become an important technique in a variety of application areas such as the Web, organizational studies, and homeland security. This paper presents a visual analytics tool, OntoVis, for understanding large, heterogeneous social networks, in which nodes and links could represent different concepts and relations, respectively. These concepts and relations are related through an ontology (a.k.a. a schema).

OntoVis is named such because it uses information in the ontology associated with a social network to semantically prune a large, heterogeneous network. In addition to semantic abstraction, OntoVis also allows users to do structural abstraction and importance ltering to make large networks manageable and to facilitate analytic reasoning. All these unique capabilities of OntoVis are illustrated with several case studies.

Keywords
Categories
In Finis Welch and Eric Hanushek, eds., Handbook on the Economics of Education, Vol. 1, Amsterdam: Elsevier, pp. 697-812. (2006)

Interpreting the Evidence on Life Cycle Skill Formation

Flavio Cunha, James J.Heckman, Lance Lochner, Dimitriy Masterov

This paper presents economic models of child development that capture the essence of recent findings from the empirical literature on skill formation. The goal of this essay is to provide a theoretical framework for interpreting the evidence from a vast empirical literature, for guiding the next generation of empirical studies, and for formulating policy. Central to our analysis is the concept that childhood has more than one stage. We formalize the concepts of self-productivity and complementarity of human capital investments and use them to explain the evidence on skill formation. Together, they explain why skill begets skill through a multiplier process.

Skill formation is a life cycle process. It starts in the womb and goes on throughout life. Families play a role in this process that is far more important than the role of schools. There are multiple skills and multiple abilities that are important for adult success.

Abilities are both inherited and created, and the traditional debate about nature versus nurture is scientifically obsolete. Human capital investment exhibits both self-productivity and complementarity. Skill attainment at one stage of the life cycle raises skill attainment at later stages of the life cycle (self-productivity). Early investment facilitates the productivity of later investment (complementarity).

Early investments are not productive if they are not followed up by later investments (another aspect of complementarity). This complementarity explains why there is no equity-efficiency trade-off for early investment. The returns to investing early in the life cycle are high. Remediation of inadequate early investments is difficult and very costly as a consequence of both self-productivity and complementarity.

Keywords
Categories
SIAM Data Mining Conference, April, 2006

Automated Knowledge Discovery from Simulator

M. Burl, Dennis DeCoste, B. Enke, D. Mazzoni, W. Merline, L. Scharenbroich, M. Burl, Dennis DeCoste, B. Enke, D. Mazzoni, W. Merline, L. Scharenbroich

In this paper, we explore one aspect of knowledge discovery from simulators, the landscape characterization problem, where the aim is to identify regions in the input/ parameter/model space that lead to a particular output behavior.

Large-scale numerical simulators are in widespread use by scientists and engineers across a range of government agencies, academia, and industry; in many cases, simulators provide the only means to examine processes that are infeasible or impossible to study otherwise.

However, the cost of simulation studies can be quite high, both in terms of the time and computational resources required to conduct the trials and the manpower needed to sift through the resulting output. Thus, there is strong motivation to develop automated methods that enable more e±cient knowledge extraction.

Keywords
Journal of Machine Learning Research (JMLR): Special Topic on Machine Learning and Optimization. 7(Jul):1493–1515, 2006

Building Support Vector Machines with Reduced Classifier Complexity

Sathiya Keerthi, Olivier Chapelle, Dennis DeCoste

Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties:

(1) it decouples the idea of basis functions from the concept of support vectors;
(2) it greedily finds a set of kernel basis functions of a specified maximum size (dmax) to approximate the SVM primal cost function well;
(3) it is efficient and roughly scales as O(ndmax2) where n is the number of training examples; and,
(4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.

Keywords
KDD 2006

Naive filterbots for robust cold-start recommendations

Seung-Taek Park, David Pennock, Omid Madani, Nathan Good, Dennis DeCoste, Seung-Taek Park, David Pennock, Omid Madani, Nathan Good, Dennis DeCoste

The goal of a recommender system is to suggest items of interest to a user based on historical behavior of a community of users. Given detailed enough history, item-based collaborative filtering (CF) often performs as well or better than almost any other recommendation method. However, in cold-start situations—where a user, an item, or the entire system is new—simple non-personalized recommendations often fare better. We improve the scalability and performance of a previous approach to handling cold-start situations that uses filterbots, or surrogate users that rate items based only on user or item attributes.

We show that introducing a very small number of simple filterbots helps make CF algorithms more robust. In particular, adding just seven global filterbots improves both user-based and item-based CF in cold-start user, cold-start item, and cold-start system settings.

Performance is better when data is scarce, performance is no worse when data is plentiful, and algorithm efficiency is negligibly affected. We systematically compare a non-personalized baseline, user-based CF, item-based CF, and our bot-augmented user- and item-based CF algorithms using three data sets (Yahoo! Movies, MovieLens, and Each-Movie) with the normalized MAE metric in three types of cold-start situations.

The advantage of our “naïve filterbot” approach is most pronounced for the Yahoo! data, the sparsest of the three data sets.

Keywords
ECML/PKDD 2006

Ensembles of Nearest Neighbor Forecasts

Dragomir Yankov, Dennis DeCoste, Eamonn Keogh, Dragomir Yankov, Dennis DeCoste, Eamonn Keogh

Nearest neighbor forecasting models are attractive with their simplicity and the ability to predict complex nonlinear behavior. They rely on the assumption that observations similar to the target one are also likely to have similar outcomes. A common practice in nearest neighbor model selection is to compute the globally optimal number of neighbors on a validation set, which is later applied for all incoming queries.

For certain queries, however, this number may be suboptimal and forecasts that deviate a lot from the true realization could be produced. To address the problem we propose an alternative approach of training ensembles of nearest neighbor predictors that determine the best number of neighbors for individual queries.

We demonstrate that the forecasts of the ensembles improve significantly on the globally optimal single predictors.

Keywords
International Conference on Machine Learning (ICML), June 2006

Collaborative Prediction Using Ensembles of Maximum Margin Matrix Factorizations

Dennis DeCoste

Fast gradient-based methods for Maximum Margin Matrix Factorization (MMMF) were recently shown to have great promise (Rennie & Srebro, 2005), including significantly outperforming the previous state-of-the-art methods on some standard collaborative prediction benchmarks (including MovieLens).

In this paper, we investigate ways to further improve the performance of MMMF, by casting it within an ensemble approach. We explore and evaluate a variety of alternative ways to define such ensembles.

We show that our resulting ensembles can perform significantly better than a single MMMF model, along multiple evaluation metrics. In fact, we find that ensembles of partially trained MMMF models can sometimes even give better predictions in total training time comparable to a single MMMF model.

Keywords
Encyclopedia of Information Science and Technology (II): 907-911. Idea Group. 2005

Distributed Recommender Systems for Internet Commerce

Badrul Sarwar, Joseph Konstan, John Riedl

No information

Keywords
The Twenty-Second International Conference on Machine Learning (ICML05), Bonn, Germany, 7-11 August, 2005

Coarticulation: An Approach for Generating Concurrent Plans in Markov Decision Processes

Khashayar Rohanimanesh, Sridhar Mahadevan

We study an approach for performing concurrent activities in Markov decision processes (MDPs) based on the coarticulation framework. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simultaneously.

We demonstrate that one natural way for generating concurrency in the system is by coarticulating among the set of learned activities available to the agent. In general due to the multiple DOF in the system, often there exists a redundant set of admissible sub-optimal policies associated with each learned activity.

Such flexibility enables the agent to concurrently commit to several subgoals according to their priority levels, given a new task defined in terms of a set of prioritized subgoals. We present efficient approximate algorithms for computing such policies and for generating concurrent plans. We also evaluate our approach in a simulated domain.

Keywords

Pages