Thompson Sampling: An Increasingly Significant Solution to the Multi-armed Bandit Problem

Ziqi Shi; Linjie Zhu; Yiwei Zhu

doi:10.23977/csic2022.014

Thompson Sampling: An Increasingly Significant Solution to the Multi-armed Bandit Problem

Download as PDF

DOI: 10.23977/csic2022.014

Author(s)

Ziqi Shi, Linjie Zhu, Yiwei Zhu

Corresponding Author

Linjie Zhu

ABSTRACT

This paper discusses the way to strengthen the probability and accuracy of multi-armed bandit. We are aimed at improving accuracy and probability when the reward of the problem is binary. We derive ε-Greedy and Upper Confidence Bounds algorithms in solving the multi-armed bandit problems and highlight the derivation and advantages of a most recent solution, namely the Thompson Sampling algorithms. Some researchers have proved that the Thompson Sampling is one of the better ways to solve the MAB problem. We show several applications of Thompson Sampling across varies of research fields to show how each problem is formulated and modeled. This paper helps people understand, review and strengthen the basic application and algorithm of Thompson Sampling and related solutions.

KEYWORDS

multi-armed bandit, Thompson sampling, ε-greedy, upper confidence bounds

Thompson Sampling: An Increasingly Significant Solution to the Multi-armed Bandit Problem

Author(s)

Corresponding Author

ABSTRACT

KEYWORDS

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US