title | booktitle | year | volume | series | address | month | publisher | url | abstract | layout | id | tex_title | bibtex_author | firstpage | lastpage | page | order | cycles | editor | author | date | container-title | genre | issued | extras | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dueling Bandits with Weak Regret |
Proceedings of the 34th International Conference on Machine Learning |
2017 |
70 |
Proceedings of Machine Learning Research |
0 |
PMLR |
We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: the more well-studied strong regret, which is 0 only when both arms pulled are the Condorcet winner; and the less well-studied weak regret, which is 0 if either arm pulled is the Condorcet winner. We propose a new algorithm for this problem, Winner Stays (WS), with variations for each kind of regret: WS for weak regret (WS-W) has expected cumulative weak regret that is |
inproceedings |
chen17c |
Dueling Bandits with Weak Regret |
Bangrui Chen and Peter I. Frazier |
731 |
739 |
731-739 |
731 |
false |
|
|
2017-07-17 |
Proceedings of the 34th International Conference on Machine Learning |
inproceedings |
|
|