Machine Learning and Data Science

Machine Learning and Data Science
This is an overview of all content belonging to this term.

Results

Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 805–814, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics

Structuring E-Commerce Inventory

Large e-commerce enterprises feature millions of items entered daily by a large variety of sellers. While some sellers provide rich, structured descriptions of their items, a vast majority of them provide unstructured natural language descriptions. In the paper we present a 2 steps method for structuring items into descriptive properties. The first step consists in unsupervised property discovery and extraction. The second step involves supervised property synonym discovery using a maximum entropy based clustering algorithm. We evaluate our method on a year worth of ecommerce
data and show that it achieves excellent precision with good recall.

Keywords
40th International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015

Switching to and Combining Offline-Adapted Cluster Acoustic Models based on Unsupervised Segment Classification

The performance of automatic speech recognition system degrades significantly when the incoming audio differs from training data. Maximum likelihood linear regression has been widely used for unsupervised adaptation, usually in a multiple-pass recognition process. Here we present a novel adaptation framework for which the offline, supervised, high-quality adaptation is applied to clustered channel/speaker conditions that are defined with automatic and manual clustering of the training data. Upon online recognition, each speech segment is classified into one of the training clusters in an unsupervised way, and the corresponding top acoustic models are used for recognition. Recognition lattice outputs are combined. Experiments are performed on the Wall Street Journal data, and a 37.5% relative reduction of Word Error Rate is reported. The proposed approach is also compared with a general speaker adaptive training approach.

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Correcting Keyboard Layout Errors and Homoglyphs in Queries

Keyboard layout errors and homoglyphs in cross-language queries impact our ability to correctly interpret user information needs and offer relevant results. We present a machine learning approach to correcting these errors, based largely on character-level n-gram features. We demonstrate superior performance over rule-based methods, as well as a significant reduction in the number of queries that yield null search results.