TY - GEN
T1 - Variable selection for ad prediction
AU - Bhat, Suma
AU - Church, Kenneth
PY - 2008
Y1 - 2008
N2 - We consider the problem of predicting the probability of a click for an advertisement when the outcome of a click or no-click is expressed by means of a set of a large number of variables. Many, if not most, of these variables are very weakly related to the clicking of the ad. Thus, a traditional approach to address this problem that treats each variable on an equal and blind footing takes away the interpretability in explaining the underlying process of the outcome. Such an approach would be computationally expensive and, further, may suffer from poor generalization. We investigate the forward selection method for variable subset selection in the domain of advertisement click-through-rate prediction. The forward selection method proceeds sequentially in a way that rewards a set of variables by how much information it provides regarding the outcome, but penalizes the set based on the number of variables in it. Concretely, we propose a logistic regression model for estimating the conditional expectation between the outcome and the ensemble of variables. The model obtained compares favorably with that obtained via an exhaustive search through the model space. We also observe that the set of variables selected by the forward selection procedure has better predictive power than that selected by considering their individual statistical significance. Thus we show that the forward-selection method for subset selection serves to produce a good model for predicting ad click-through-rates.
AB - We consider the problem of predicting the probability of a click for an advertisement when the outcome of a click or no-click is expressed by means of a set of a large number of variables. Many, if not most, of these variables are very weakly related to the clicking of the ad. Thus, a traditional approach to address this problem that treats each variable on an equal and blind footing takes away the interpretability in explaining the underlying process of the outcome. Such an approach would be computationally expensive and, further, may suffer from poor generalization. We investigate the forward selection method for variable subset selection in the domain of advertisement click-through-rate prediction. The forward selection method proceeds sequentially in a way that rewards a set of variables by how much information it provides regarding the outcome, but penalizes the set based on the number of variables in it. Concretely, we propose a logistic regression model for estimating the conditional expectation between the outcome and the ensemble of variables. The model obtained compares favorably with that obtained via an exhaustive search through the model space. We also observe that the set of variables selected by the forward selection procedure has better predictive power than that selected by considering their individual statistical significance. Thus we show that the forward-selection method for subset selection serves to produce a good model for predicting ad click-through-rates.
KW - Click-through-rate
KW - Model selection
KW - Variable selection
KW - Web advertising
UR - http://www.scopus.com/inward/record.url?scp=70349152916&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349152916&partnerID=8YFLogxK
U2 - 10.1145/1517472.1517478
DO - 10.1145/1517472.1517478
M3 - Conference contribution
AN - SCOPUS:70349152916
SN - 9781605582771
T3 - Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD'08
SP - 45
EP - 49
BT - Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD'08
T2 - 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD'08
Y2 - 24 August 2008 through 24 August 2008
ER -