Uwe Mayer
Uwe Mayer
Research Scientist
Research Interests
Parallel and distributed computing, text mining, policy violation and fraud detection, recommendations and merchandising, data mining, and machine learning.
Prior to joining eBay, Uwe was a senior research scientist at Yahoo, and was a director of Analytic Science at FICO. He has been a professor of mathematics at universities in both the U.S. and in Germany.

Uwe received his MA and PhD in mathematics from the University of Utah where he was a Fulbright scholar, with an extended research stay at the Institute for Advanced Studies at Princeton. He carried out his undergraduate studies with a double major in Mathematics and Computer Sciences in Germany. Bringing his academic career full circle from computer sciences to mathematics back to computers, Uwe also has co-advised a PhD student in data mining at the University of California, San Diego.
Uwe Mayer, Nish Parikh, Gyanit Singh
2013 IEEE International Conference on Big Data (IEEE Big Data 2013 Tutorial)
Abstract [+]
This tutorial will summarize state-of-the-art approaches in the growing area of large scale click-stream mining. It will give an opportunity to data scientists, researchers and engineers with diverse backgrounds to familiarize themselves with practical platforms, approaches and tools for extracting actionable insights and building products from big and diverse data sources. The organizers will accomplish this goal using three real-life stories from the field (large scale data initiatives at eBay – one of the world’s largest e-commerce platforms). The tutorial will feature transaction mining, behavior log mining and time-series mining. We will talk about building robust recommendation systems over map reduce clusters (query suggestions, shipping fee recommendations). Talk will also include topics like user bias removal from data, using heuristics to make intractable algorithms practical and appropriate de-noising and normalization of diverse data-sets. Audience is expected to be familiar with map-reduce (preferably Hadoop). Audience is also expected to be working or grappling with data problems. Some basic background in algorithms, statistics would be beneficial. We will present the tutorial through real applications built at eBay. We will present three case studies. • Shipping Recommendation System • Mining large-scale temporal dynamics with Hadoop • Query Suggestions at scale with Hadoop:
Uwe Mayer
Proceedings of KDD’12, Beijing, China. August 2012
Abstract [+]
We present an algorithm for language identification, in particular of short documents, for the case of an Internet domain with sites in multiple countries with differing languages. The algorithm is significantly faster than standard language identification methods, while providing state-of-the-art identification. We bootstrap the algorithm based on the language identification based on the site alone, a methodology suitable for any supervised language identification algorithm. We demonstrate the bootstrapping and algorithm on eBay email data and on Twitter status updates data. The algorithm is deployed at eBay as part of the back-office development data repository.
Uwe Mayer, Cécile Levasseur, Ken Kreutz-Delgado
Proceedings of the Nineteenth IEEE Int’l Workshop on Machine Learning for Signal Processing, Grenoble, France. September 2009
Abstract [+]
We consider the problem of both supervised and unsupervised classification for multidimensional data that are nongaussian and of mixed types (continuous and/or discrete). An important subclass of graphical model techniques called Generalized Linear Statistics (GLS) is used to capture the underlying statistical structure of these complex data. GLS exploits the properties of exponential family distributions, which are assumed to describe the data components, and constrains latent variables to a lower dimensional parameter subspace. Based on the latent variable information, classification is performed in the natural parameter subspace with classical statistical techniques. The benefits of decision making in parameter space is illustrated with examples of categorical data text categorization and mixed-type data classification. As a text document preprocessing tool, an extension from binary to categorical data of the conditional mutual information maximization based feature selection algorithm is presented.
Uwe Mayer, Cécile Levasseur, Brandon Burge, Ken Kreutz-Delgado
Proceedings of the 2008 IAPR Workshop on Cognitive Information Processing. pp. 126-131. Santorini, Greece. 2008
Abstract [+]
Minority class detection is the problem of detecting the occurrence of rare key events differing from the majority of a data set. This paper considers the problem of unsupervised minority class detection for multidimensional data that are highly nongaussian, mixed (continuous and/or discrete), noisy, and nonlinearly related, such as occurs, for example, in fraud detection in typical financial data. A statistical modeling approach is proposed which is a subclass of graphical model techniques. It exploits the properties of exponential family distributions and generalizes techniques from classical linear statistics into a framework referred to as Generalized Linear Statistics (GLS). The methodology exploits the split between the data space and the parameter space for exponential family distributions and solves a nonlinear problem by using classical linear statistical tools applied to data that has been mapped into the parameter space. A fraud detection technique utilizing low-dimensional information learned by using an Iteratively Reweighted Least Squares (IRLS) based approach to GLS is proposed in the parameter space for data of mixed type. ROC curves for an initial simulation on synthetic data are presented, which gives predictions for results on actual financial data sets.
Uwe Mayer, Cécile Levasseur, Brandon Burge, Ken Kreutz-Delgado
Proceedings of the 1st International Workshop on Data Mining and Artificial Intelligence (DMAI 2008) held in conjunction with the 11th IEEE International Conference on Computer and Information Technology. Khulna, Bangladesh. December 2008
Abstract [+]
We present a general viewpoint using Bregman divergences and exponential family properties that contains as special cases the three following algorithms: 1) exponential family Principal Component Analysis (exponential PCA), 2) Semi-Parametric exponential family Principal Component Analysis (SP-PCA) and 3) Bregman soft clustering. This framework is equivalent to a mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We show that within this framework exponential PCA and SPPCA are similar to the Bregman soft clustering technique with the addition of a linear constraint in the parameter space. We implement the resulting modifications to SP-PCA and Bregman soft clustering for mixed (continuous and/or discrete) data sets, and add a nonparametric estimation of the point-mass probabilities to exponential PCA. Finally, we compare the relative performances of the three algorithms in a clustering setting for mixed data sets.
Uwe Mayer, Cécile Levasseur, Brandon Burge, Ken Kreutz-Delgado, Gregory Gancarz
Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers. pp. 545-549. 2005
Abstract [+]
Many important expert system applications depend on the ability to accurately detect or predict the occurrence of key events given a data set of observations. We concentrate on multidimensional data that are highly nongaussian (continuous and/or discrete), noisy and nonlinearly related. We investigate the feasibility of data-pattern discovery and event detection by applying generalized principal component analysis (GPCA) techniques for pattern extraction based on an exponential family probability distribution assumption. We develop theoretical extensions of the GPCA model by exploiting results from the theory of generalized linear models and nonparametric mixture density estimation.
Uwe Mayer, Armand Sarkissian
WProceedings of KDD-2003. pp. 717-722. Washington, DC. 2003
Abstract [+]
Data mining techniques are routinely used by fundraisers to select those prospects from a large pool of candidates who are most likely to make a financial contribution. These techniques often rely on statistical models based on trial performance data. This trial performance data is typically obtained by soliciting a smaller sample of the possible prospect pool. Collecting this trial data involves a cost; therefore the fundraiser is interested in keeping the trial size small while still collecting enough data to build a reliable statistical model that will be used to evaluate the remain-der of the prospects. We describe an experimental design approach to optimally choose the trial prospects from an existing large pool of prospects. Pros-pects are clustered to render the problem practically tractable. We modify the standard D-optimality algorithm to prevent repeated selection of the same prospect cluster, since each prospect can only be solicited at most once. We assess the benefits of this approach on the KDD-98 data set by comparing the performance of the model based on the optimal trial data set with that of a model based on a randomly selected trial data set of equal size.
Uwe Mayer, Gieri Simonett
Interfaces and Free Boundaries. 4, no. 1, pp. 89-109. 2002
Abstract [+]
We present a numerical scheme for radially symmetric solutions to curvature driven moving boundary problems governed by a local law of motion, e.g. the mean curvature flow, the surface diffusion flow, and the Willmore flow. We then present several numerical experiments for the Willmore flow. In particular, we provide numerical evidence that the Willmore flow can develop singularities in finite time.
Uwe Mayer
Experimental Math,10, no. 2, pp. 103-107. 2001
Abstract [+]
An example of an embedded curve is presented which under numerical simulation of the averaged mean curvature flow develops first a loss of embeddedness, and then a singularity where the curvature becomes infinite, all in finite time. This leads to the conjecture that not all smooth embedded curves persist for all times under the averaged mean curvature flow.
Uwe Mayer, Joachim Escher
Archiv der Mathematik, 77, issue 5, pp. 434-448. 2001
Abstract [+]
This modified (two-sided) Mullins-Sekerka model is a nonlocal evolution model for closed hypersurfaces, which appears as a singular limit of a modified Cahn-Hilliard equation describing microphase separation of diblock copolymer. Under this evolution the propagating interfaces maintain the enclosed volumes of the two phases. We will show by means of an example that this model does not preserve convexity in two space dimensions.
Automated entity identification for efficient profiling in an event probability prediction system
A. Vaiciulis, L. Peranich, U. Mayer, S. Zoldi and S. De Zilwa
United States Patent 8121962 (Granted 2012/02/21, application 12/110,261 filed 2008/04/25)
Comprehensive Identity Theft Protection System
T. Crooks, U. Mayer and M. Lazarus
United States Patent 8296250 (Granted 2012/10/23, continuation application 13/195,328 filed 2011/08/01)
United States Patent 7849029 (Granted 2010/12/07, continuation application 11/421,896 filed 2006/06/02)
United States Patent 7991716 (Granted 2011/08/02, application 12/961,478 filed 2010/12/06)
Fast Accurate Fuzzy Matching
U. Mayer, V. Narayanan and M. Blume
United States Patent 7870151 (Granted 2011/01/11, application 20080189279 filed 2007/06/13)
Adaptive Analytics
S. Zoldi, L. Peranich, U. Mayer, J. Athwal, and Sajama
United States Patent Application 20090222243 (Filed 2008/02/29)
Fraud Detection System for the Faster Payments Systemg
S. Zoldi and U. Mayer
United States Patent Application 20090222369 (Filed 2008/05/13)