A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning
Keywords:Multi-Agent Systems, deep reinforcement learning (DRL), graph convolution neural network
Multi-Agent Reinforcement Learning (MARL) is widely used to solve various real-world problems. In MARL, the environment contains multiple agents. A good grasp of the environment can guide agents to learn cooperative strategies. In Centralized Training Decentralized Execution (CTDE), a centralized critic is used to guide cooperative strategies learning. However, having multiple agents in the environment leads to the curse of dimensionality and influence of other agents’ strategies, resulting in difficulties for centralized critics to learn good cooperative strategies. We propose a graph-based approach to overcome the above problems. It uses a graph neural network, which uses partial observations of agents as input, and information between agents is aggregated by graph methods to extract information about the whole environment. In this way, agents can improve their understanding of the overall state of the environment and other agents in the environment while avoiding dimensional explosion. Then we combine a dual critic dynamic decomposition method with soft actor-critic to train policy. The former uses individual and global rewards for learning, avoiding the influence of other agents’ strategies, and the latter help to learn an optional policy better. We call this approach Multi-Agent Graph-based soft Actor-Critic (MAGAC). We compare our proposed method with several classical MARL algorithms under the Multi-agent Particle Environment (MPE). The experimental results show that our method can achieve a faster learning speed while learning better policy.
Mnih, Volodymyr.; Kavukcuoglu, Koray.; Silver, David.; Rusu, Andrei A.; Veness, Joel.; Bellemare, Marc G.; Graves, Alex.; Riedmiller, Martin.; Fidjeland, Andreas K.; Ostrovski, Georg. (2015). Human-level control through deep reinforcement learning, nature, 518(7540), 529-533, 2015.
Duan, Yan.; Chen, Xi.; Houthooft, Rein.; Schulman, John.; Abbeel, Pieter. (2016). Benchmarking deep reinforcement learning for continuous control, International conference on machine learning, 1329-1338, 2016.
Ma, X.; Li, Z.; Zhang, L. (2022). An Improved ResNet-50 for Garbage Image Classification, Tehnicki vjesnik, 29(5), 1552-1559, 2022.
Baressi Šegota*, S.; An?elic, N.; Car, Z.; Šercer, M. (2021). Neural Network-Based Model for Classification of Faults During Operation of a Robotic Manipulator, Tehnicki vjesnik, 28(4), 1380-1387, 2021.
Ozgur, Cemile.; Sarikovanlik, Vedat. (2022). Forecasting BIST100 and NASDAQ Indices with Single and Hybrid Machine Learning Algorithms, Economic Computation And Economic Cybernetics Studies And Research, 56(3), 235-250, 2022.
Lee, Y.S. (2022). A study on abnormal behavior detection in CCTV images through the supervised learning model of deep learning, Journal of Logistics, Informatics and Service Science, 9(2), 196- 209, 2022.
Afify, Heba M.; Mohammed, Kamel K.; Hassanien, Aboul Ella. (2020). Multi-images recognition of breast cancer histopathological via probabilistic neural network approach, Journal of System and Management Sciences, 10(2), 53-68, 2020.
Silver, David.; Huang, Aja.; Maddison, Chris J.; Guez, Arthur.; Sifre, Laurent.; Van Den Driessche, George.; Schrittwieser, Julian.; Antonoglou, Ioannis.; Panneershelvam, Veda.; Lanctot, Marc. (2016). Mastering the game of Go with deep neural networks and tree search, nature, 529(7587), 484-489, 2016.
Shi, Haobin.; Lin, Zhiqiang.; Zhang, Shuge.; Li, Xuesi.; Hwang, Kao-Shing. (2018). An adaptive decision-making method with fuzzy bayesian reinforcement learning for robot soccer, Information Sciences, 436, 268-281, 2018.
Justesen, Niels.; Bontrager, Philip.; Togelius, Julian.; Risi, Sebastian. (2019). Deep learning for video game playing, IEEE Transactions on Games, 12(1), 1-20, 2019.
Din, Adnan Fayyaz Ud.; Mir, Imran.; Gul, Faiza.; Nasar, Al.; Rustom, Mohammad.; Abualigah, Laith. (2022). Reinforced Learning-Based Robust Control Design for Unmanned Aerial Vehicle: Accurate forecasts of vehicle motion, Communications of the ACM, DOI: 10.1007/s13369-022- 06746-0, 1-16, 2022.
Fan, Xudong.; Zhang, Xijin.; Wang, Xiaowei.; Yu, Xiong. (2022). A Novel Deep Reinforcement Learning Model for Resilient Road Network Recovery from Multiple Hazards, Journal of Infrastructure Preservation and Resilience, DOI: doi: 10.21203/rs.3.rs-2052084/v1, 2022.
Shi, Haobin.; Shi, Lin.; Xu, Meng.; Hwang, Kao-Shing. (2019). End-to-end navigation strategy with deep reinforcement learning for mobile robots, IEEE Transactions on Industrial Informatics, 16(4), 2393-2402, 2019.
Lillicrap, Timothy P.; Hunt, Jonathan J.; Pritzel, Alexander.; Heess, Nicolas.; Erez, Tom.; Tassa, Yuval.; Silver, David.; Wierstra, Daan. (2015). Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971, 2015.
Lowe, Ryan.; Wu, Yi I.; Tamar, Aviv.; Harb, Jean.; Pieter Abbeel, OpenAI.; Mordatch, Igor. (2017). Multiagent actor-critic for mixed cooperative-competitive environments, Advances in neural information processing systems, 30, 6379-6390, 2017.
Foerster, Jakob.; Farquhar, Gregory.; Afouras, Triantafyllos.; Nardelli, Nantas.; Whiteson, Shimon. (2018). Counterfactual multi-agent policy gradients, Proceedings of the AAAI conference on artificial intelligence, 32(1), 2974-2982, 2018.
Iqbal, Shariq.; Sha, Fei. (2019). Actor-attention-critic for multi-agent reinforcement learning, International conference on machine learning, 2961-2970, 2019.
Yang, Yaodong.; Luo, Rui.; Li, Minne.; Zhou, Ming.; Zhang, Weinan.; Wang, Jun. (2018). Mean field multi-agent reinforcement learning, International conference on machine learning, 5571- 5580, 2018.
Jiang, Jiechuan and Lu, Zongqing. (2018). Learning attentional communication for multi-agent cooperation, Advances in neural information processing systems, 31(6), 7265-7275, 2018.
Scarselli, Franco.; Gori, Marco.; Tsoi, Ah Chung.; Hagenbuchner, Markus.; Monfardini, Gabriele. (2008). The graph neural network model, IEEE transactions on neural networks, 20(1), 61-80, 2008.
Liu, Iou-Jen.; Yeh, Raymond A.; Schwing, Alexander G. (2020). PIC: permutation invariant critic for multi-agent deep reinforcement learning, Conference on Robot Learning, 590-602, 2020.
Ruan, Jingqing.; Du, Yali.; Xiong, Xuantang.; Xing, Dengpeng.; Li, Xiyun.; Meng, Linghui.; Zhang, Haifeng.; Wang, Jun.; Xu, Bo. (2022). GCS: Graph-Based Coordination Strategy for Multi-Agent Reinforcement Learning, International Conference on Autonomous Agents and Multiagent Systems, 1128-1136, 2022.
Sheikh, Hassam Ullah.; Bölöni, Ladislau. (2020). Multi-agent reinforcement learning for problems with combined individual and team reward, International Joint Conference on Neural Networks, 1-8, 2020.
Pan, Wei.; Wang, Nanding.; Xu, Chenxi.; Hwang, Kao-Shing. (2021). A dynamically adaptive approach to reducing strategic interference for multi-agent systems, IEEE Transactions on Cognitive and Developmental Systems, 14(4), 1486-1495, 2022.
Hamilton, Will.; Ying, Zhitao.; Leskovec, Jure. (2017). Inductive representation learning on large graphs, Advances in neural information processing systems, 30, 2017.
Haarnoja, Tuomas.; Zhou, Aurick.; Abbeel, Pieter.; Levine, Sergey. (2020). Off-policy maximum entropy deep reinforcement learning with a stochastic actor, Proceedings of Machine Learning Research, 80, 1861-1870, 2018.
Konda, Vijay and Tsitsiklis, John. (1999). Actor-Critic Algorithms, Advances in neural information processing systems, 12, 1008-1014, 1999.
Van Hasselt, Hado.; Guez, Arthur.; Silver, David. (2016). Deep reinforcement learning with double q-learning, Proceedings of the AAAI conference on artificial intelligence, 30(1), 2159-5399, 2016.
Kipf, Thomas N.; Welling, Max. (2016). Semi-supervised classification with graph convolutional networks Learning, arXiv preprint arXiv:1609.02907, 2016.
Shi, Haobin.; Li, Jingchen.; Mao, Jiahui.; Hwang, Kao-Shing. (2021). Lateral transfer learning for multiagent reinforcement learning, IEEE Transactions on Cybernetics, DOI: 10.1109/TCYB.2021.3108237, 2021.
Li, Jingchen and Shi, Haobin and Hwang, Kao-Shing. (2021). An explainable ensemble feedforward method with Gaussian convolutional filter, Knowledge-Based Systems, 225, 107103.1- 107103.11, 2021.
Copyright (c) 2023 Wei Pan, Cheng Liu
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.