From Text to Cases: Machine Aided Text Categorization for Capturing Business Reengineering Cases

AAAI Workshop on textual case-based reasoning. 1998
From Text to Cases: Machine Aided Text Categorization for Capturing Business Reengineering Cases
Catherine Baudin, Scott Waterman
Abstract

Sharing business experience, such as client engagements, proposals or best practices, is an important part of the knowledge management task within large business organizations. While full text search is a first step at accessing textual material describing corporate experience, it does not highlight important concepts and similarities between business practices structured or operated differently.

Conceptual indexing languages, on the other hand, are high level indexing schemes based on taxonomies of domain concepts designed to provide a common language to describe, retrieve, and compare cases.

However, the effective use of these high level languages is limited by the fact that they require users to be able to *describe cases in terms an often large body of controlled vocabulary. The main challenge to using CBR and data mining technology for accessing and analyzing corporate knowledge is not in designing sophisticated inference mechanisms, but is in representing large bodies of qualitative information in textual form for reuse.

This knowledge representation task is the process of mapping textual information to predefined domain models designed by knowledgeable domain experts. We are experimenting with machine aided text categorization technology to support the creation of quality controlled repositories of corporate experience in the business domain.

Another publication from the same category: Machine Learning and Data Science

IEEE Computing Conference 2018, London, UK

Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

The kernel trick concept, formulated as an inner product in a feature space, facilitates powerful extensions to many well-known algorithms. While the kernel matrix involves inner products in the feature space, the sample covariance matrix of the data requires outer products. Therefore, their spectral properties are tightly connected. This allows us to examine the kernel matrix through the sample covariance matrix in the feature space and vice versa. The use of kernels often involves a large number of features, compared to the number of observations. In this scenario, the sample covariance matrix is not well-conditioned nor is it necessarily invertible, mandating a solution to the problem of estimating high-dimensional covariance matrices under small sample size conditions. We tackle this problem through the use of a shrinkage estimator that offers a compromise between the sample covariance matrix and a well-conditioned matrix (also known as the "target") with the aim of minimizing the mean-squared error (MSE). We propose a distribution-free kernel matrix regularization approach that is tuned directly from the kernel matrix, avoiding the need to address the feature space explicitly. Numerical simulations demonstrate that the proposed regularization is effective in classification tasks.

Keywords