Thompson Sampling: An Increasingly Significant Solution to the Multi-armed Bandit Problem
Download as PDF
DOI: 10.23977/csic2022.014
Author(s)
Ziqi Shi, Linjie Zhu, Yiwei Zhu
Corresponding Author
Linjie Zhu
ABSTRACT
This paper discusses the way to strengthen the probability and accuracy of multi-armed bandit. We are aimed at improving accuracy and probability when the reward of the problem is binary. We derive ε-Greedy and Upper Confidence Bounds algorithms in solving the multi-armed bandit problems and highlight the derivation and advantages of a most recent solution, namely the Thompson Sampling algorithms. Some researchers have proved that the Thompson Sampling is one of the better ways to solve the MAB problem. We show several applications of Thompson Sampling across varies of research fields to show how each problem is formulated and modeled. This paper helps people understand, review and strengthen the basic application and algorithm of Thompson Sampling and related solutions.
KEYWORDS
multi-armed bandit, Thompson sampling, ε-greedy, upper confidence bounds