Keyboard layout errors and homoglyphs in cross-language queries impact our ability to correctly interpret user information needs and offer relevant results. We present a machine learning approach to correcting these errors, based largely on character-level n-gram features. We demonstrate superior performance over rule-based methods, as well as a significant reduction in the number of queries that yield null search results.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Correcting Keyboard Layout Errors and Homoglyphs in Queries