Abstract
In this article, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to the work by? except that we expand the demand curve to a semiparametric model and learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision making policy that minimizes regret (maximizes revenue) by combining semiparametric estimation for a generalized linear model with unknown link and online decision making. Under mild conditions, for a market noise cdf (Formula presented.) with mth order derivative ((Formula presented.)), our policy achieves a regret upper bound of (Formula presented.), where T is the time horizon and (Formula presented.) is the order hiding logarithmic terms and the feature dimension d. The upper bound is further reduced to (Formula presented.) if F is super smooth. These upper bounds are close to (Formula presented.), the lower bound where F belongs to a parametric class. We further generalize these results to the case with dynamic dependent product features under the strong mixing condition. Supplementary materials for this article are available online.
Original language | English (US) |
---|---|
Pages (from-to) | 552-564 |
Number of pages | 13 |
Journal | Journal of the American Statistical Association |
Volume | 119 |
Issue number | 545 |
DOIs | |
State | Published - 2024 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Contextual dynamic pricing
- Generalized linear model with unknown link
- Nonparametric statistics
- Policy optimization