Data mining techniques are routinely used by fundraisers to select those prospects from a large pool of candidates who are most likely to make a financial contribution. These techniques often rely on statistical models based on trial performance data.
This trial performance data is typically obtained by soliciting a smaller sample of the possible prospect pool. Collecting this trial data involves a cost; therefore the fundraiser is interested in keeping the trial size small while still collecting enough data to build a reliable statistical model that will be used to evaluate the remain-der of the prospects.
We describe an experimental design approach to optimally choose the trial prospects from an existing large pool of prospects. Pros-pects are clustered to render the problem practically tractable. We modify the standard D-optimality algorithm to prevent repeated selection of the same prospect cluster, since each prospect can only be solicited at most once. We assess the benefits of this approach on the KDD-98 data set by comparing the performance of the model based on the optimal trial data set with that of a model based on a randomly selected trial data set of equal size.