Abstract
Stable matching, a classical model for two-sided markets, has long been studied assuming known preferences. In reality agents often have to learn about their preferences through exploration. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards. Our model extends the standard multi-armed bandits framework to multiple players, with the added feature that arms have preferences over players. We study both centralized and decentralized approaches to this problem and show surprising exploration-exploitation trade-offs compared to the single player multi-armed bandits setting.
Original language | English (US) |
---|---|
Pages (from-to) | 1618-1628 |
Number of pages | 11 |
Journal | Proceedings of Machine Learning Research |
Volume | 108 |
State | Published - 2020 |
Externally published | Yes |
Event | 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online Duration: Aug 26 2020 → Aug 28 2020 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability