Education, Science, Technology, Innovation and Life
Open Access
Sign In

A Review of Hyperparameter Tuning Methods for Reinforcement Learning——Taking DQN, PPO, and A3C Algorithms as Examples

Download as PDF

DOI: 10.23977/cpcs.2026.100101 | Downloads: 0 | Views: 25

Author(s)

Hongyuan Liu 1

Affiliation(s)

1 School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China

Corresponding Author

Hongyuan Liu

ABSTRACT

The performance of reinforcement learning algorithms is highly dependent on hyperparameter configuration. Unreasonable hyperparameter settings can easily lead to training non-convergence, slow convergence, or poor policy performance. This paper takes three mainstream algorithms in the field of deep reinforcement learning, namely DQN (Deep Q-Network), PPO (Proximal Policy Optimization), and A3C (Asynchronous Advantage Actor-Critic), as the research objects. It systematically sorts out the key hyperparameters and core action mechanisms of each algorithm, and summarizes the general hyperparameter tuning rules and targeted strategies through literature research. To verify the effectiveness of the tuning methods, small-scale comparative experiments are designed to test the convergence performance and final policy effect of different hyperparameter configurations in classic Gym environments. Based on the experimental results and literature analysis, a practical parameter setting guide for beginners is extracted to reduce the application threshold of reinforcement learning algorithms. The research shows that reasonable hyperparameter tuning can increase the algorithm convergence speed by more than 30% and the final performance by about 15%, among which the learning rate, exploration strategy-related parameters, and regularization coefficients have the most significant impact on algorithm performance.

KEYWORDS

Reinforcement Learning; Hyperparameter Tuning; DQN; PPO; A3C; Parameter Setting Guide

CITE THIS PAPER

Hongyuan Liu. A Review of Hyperparameter Tuning Methods for Reinforcement Learning——Taking DQN, PPO, and A3C Algorithms as Examples. Computing, Performance and Communication Systems (2026) Vol. 10: 1-8. DOI: http://dx.doi.org/10.23977/cpcs.2026.100101.

REFERENCES

[1] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." nature 518.7540 (2015): 529-533.
[2] Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
[3] Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PmLR, 2016.
[4] Eimer, Theresa, Marius Lindauer, and Roberta Raileanu. "Hyperparameters in reinforcement learning and how to tune them." International conference on machine learning. PMLR, 2023.
[5] Liaw, Richard, et al. "Tune: A research platform for distributed model selection and training." arXiv preprint arXiv:1807.05118 (2018).
[6] Wang, Linnan, Rodrigo Fonseca, and Yuandong Tian. "Learning search space partition for black-box optimization using monte carlo tree search." Advances in Neural Information Processing Systems 33 (2020): 19511-19522.
[7] Huang, Shengyi, et al. "Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms." Journal of Machine Learning Research 23.274 (2022): 1-18.

Downloads: 3476
Visits: 232402

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.