Jean-David Ruvini

Jean-David Ruvini
Research Scientist
Biography

Jean-David (JD) joined eBay Research Lab in 2007. Prior to coming to eBay Jean-David worked at Shopping.com Research Lab where he contributed to design and improve Shopping.com classification and attribute extraction technologies. Prior to that Jean-David spent five years Bouygues Research Lab (a French conglomerate with telco, television, construction and water supply subsidiaries) working on machine learning related projects. Jean-David obtained his PhD in Computer Science (Intelligent User Interfaces) from University of Montpellier in France in 2000.

Publications
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 805–814, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics

Structuring E-Commerce Inventory

Large e-commerce enterprises feature millions of items entered daily by a large variety of sellers. While some sellers provide rich, structured descriptions of their items, a vast majority of them provide unstructured natural language descriptions. In the paper we present a 2 steps method for structuring items into descriptive properties. The first step consists in unsupervised property discovery and extraction. The second step involves supervised property synonym discovery using a maximum entropy based clustering algorithm. We evaluate our method on a year worth of ecommerce
data and show that it achieves excellent precision with good recall.

Keywords
Proceedings of NAACL-HLT 2015, pages 160–167, Denver, Colorado, May 31 – June 5, 2015. c 2015 Association for Computational Linguistics

Distributed Word Representations Improve NER for e-Commerce

This paper presents a case study of using distributed word representations, word2vec in particular, for improving performance of Named Entity Recognition for the e-Commerce domain. We also demonstrate that distributed word representations trained on a smaller amount of in-domain data are more effective than word vectors trained on very large amount of out-of-domain data, and that their combination gives the best results.

Keywords
Patents