From Text to Cases: Machine Aided Text Categorization for Capturing Business Reengineering Cases

AAAI Workshop on textual case-based reasoning. 1998
From Text to Cases: Machine Aided Text Categorization for Capturing Business Reengineering Cases
Catherine Baudin, Scott Waterman
Abstract

Sharing business experience, such as client engagements, proposals or best practices, is an important part of the knowledge management task within large business organizations. While full text search is a first step at accessing textual material describing corporate experience, it does not highlight important concepts and similarities between business practices structured or operated differently.

Conceptual indexing languages, on the other hand, are high level indexing schemes based on taxonomies of domain concepts designed to provide a common language to describe, retrieve, and compare cases.

However, the effective use of these high level languages is limited by the fact that they require users to be able to *describe cases in terms an often large body of controlled vocabulary. The main challenge to using CBR and data mining technology for accessing and analyzing corporate knowledge is not in designing sophisticated inference mechanisms, but is in representing large bodies of qualitative information in textual form for reuse.

This knowledge representation task is the process of mapping textual information to predefined domain models designed by knowledgeable domain experts. We are experimenting with machine aided text categorization technology to support the creation of quality controlled repositories of corporate experience in the business domain.

Another publication from the same category: Machine Learning and Data Science

WWW '17 Perth Australia April 2017

Drawing Sound Conclusions from Noisy Judgments

David Goldberg, Andrew Trotman, Xiao Wang, Wei Min, Zongru Wan

The quality of a search engine is typically evaluated using hand-labeled data sets, where the labels indicate the relevance of documents to queries. Often the number of labels needed is too large to be created by the best annotators, and so less accurate labels (e.g. from crowdsourcing) must be used. This introduces errors in the labels, and thus errors in standard precision metrics (such as P@k and DCG); the lower the quality of the judge, the more errorful the labels, consequently the more inaccurate the metric. We introduce equations and algorithms that can adjust the metrics to the values they would have had if there were no annotation errors.

This is especially important when two search engines are compared by comparing their metrics. We give examples where one engine appeared to be statistically significantly better than the other, but the effect disappeared after the metrics were corrected for annotation error. In other words the evidence supporting a statistical difference was illusory, and caused by a failure to account for annotation error.

Keywords