Analysis of Regret Function Trends of Algorithms for Multi-armed Bandit Problem

Bingjiang Xia

doi:10.23977/csic2022.013

Analysis of Regret Function Trends of Algorithms for Multi-armed Bandit Problem

Download as PDF

DOI: 10.23977/csic2022.013

Author(s)

Bingjiang Xia

Corresponding Author

Bingjiang Xia

ABSTRACT

The multi-armed bandit (MAB) problem has gained interest in reinforcement learning and other areas due to its wide applications. Many algorithms for the MAB problem have been proposed. This paper discusses the three most common algorithms for the MAB problem: ε-greedy algorithm, upper confidence bound algorithm, and Thompson sampling algorithm. Since the performance, in terms of expected returns, of these algorithms is critical when deciding which algorithm to use, this paper analyzes the trends and properties of the regret function of these algorithms to compare their performance by stimulations. It is suggested that the Thompson sampling algorithm usually shows the best algorithm among the three algorithms based on the simulation performed.

KEYWORDS

multi-armed bandit problem, Thompson sampling, regret function

Analysis of Regret Function Trends of Algorithms for Multi-armed Bandit Problem

Author(s)

Corresponding Author

ABSTRACT

KEYWORDS

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US