A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning

Wei Pan; Cheng Liu

doi:10.15837/ijccc.2023.1.5062

Authors

Wei Pan Northwestern Polytechnical University, China
Cheng Liu Northwestern Polytechnical University, China

DOI:

https://doi.org/10.15837/ijccc.2023.1.5062

Keywords:

Multi-Agent Systems, deep reinforcement learning (DRL), graph convolution neural network

Abstract

Multi-Agent Reinforcement Learning (MARL) is widely used to solve various real-world problems. In MARL, the environment contains multiple agents. A good grasp of the environment can guide agents to learn cooperative strategies. In Centralized Training Decentralized Execution (CTDE), a centralized critic is used to guide cooperative strategies learning. However, having multiple agents in the environment leads to the curse of dimensionality and influence of other agents’ strategies, resulting in difficulties for centralized critics to learn good cooperative strategies. We propose a graph-based approach to overcome the above problems. It uses a graph neural network, which uses partial observations of agents as input, and information between agents is aggregated by graph methods to extract information about the whole environment. In this way, agents can improve their understanding of the overall state of the environment and other agents in the environment while avoiding dimensional explosion. Then we combine a dual critic dynamic decomposition method with soft actor-critic to train policy. The former uses individual and global rewards for learning, avoiding the influence of other agents’ strategies, and the latter help to learn an optional policy better. We call this approach Multi-Agent Graph-based soft Actor-Critic (MAGAC). We compare our proposed method with several classical MARL algorithms under the Multi-agent Particle Environment (MPE). The experimental results show that our method can achieve a faster learning speed while learning better policy.

References

Mnih, Volodymyr.; Kavukcuoglu, Koray.; Silver, David.; Rusu, Andrei A.; Veness, Joel.; Bellemare, Marc G.; Graves, Alex.; Riedmiller, Martin.; Fidjeland, Andreas K.; Ostrovski, Georg. (2015). Human-level control through deep reinforcement learning, nature, 518(7540), 529-533, 2015.

https://doi.org/10.1038/nature14236

Duan, Yan.; Chen, Xi.; Houthooft, Rein.; Schulman, John.; Abbeel, Pieter. (2016). Benchmarking deep reinforcement learning for continuous control, International conference on machine learning, 1329-1338, 2016.

Ma, X.; Li, Z.; Zhang, L. (2022). An Improved ResNet-50 for Garbage Image Classification, Tehnicki vjesnik, 29(5), 1552-1559, 2022.

https://doi.org/10.17559/TV-20220420124810

Baressi Šegota*, S.; An?elic, N.; Car, Z.; Šercer, M. (2021). Neural Network-Based Model for Classification of Faults During Operation of a Robotic Manipulator, Tehnicki vjesnik, 28(4), 1380-1387, 2021.

https://doi.org/10.17559/TV-20201112163731

Ozgur, Cemile.; Sarikovanlik, Vedat. (2022). Forecasting BIST100 and NASDAQ Indices with Single and Hybrid Machine Learning Algorithms, Economic Computation And Economic Cybernetics Studies And Research, 56(3), 235-250, 2022.

https://doi.org/10.24818/18423264/56.3.22.15

Lee, Y.S. (2022). A study on abnormal behavior detection in CCTV images through the supervised learning model of deep learning, Journal of Logistics, Informatics and Service Science, 9(2), 196- 209, 2022.

Afify, Heba M.; Mohammed, Kamel K.; Hassanien, Aboul Ella. (2020). Multi-images recognition of breast cancer histopathological via probabilistic neural network approach, Journal of System and Management Sciences, 10(2), 53-68, 2020.

Silver, David.; Huang, Aja.; Maddison, Chris J.; Guez, Arthur.; Sifre, Laurent.; Van Den Driessche, George.; Schrittwieser, Julian.; Antonoglou, Ioannis.; Panneershelvam, Veda.; Lanctot, Marc. (2016). Mastering the game of Go with deep neural networks and tree search, nature, 529(7587), 484-489, 2016.

https://doi.org/10.1038/nature16961

Shi, Haobin.; Lin, Zhiqiang.; Zhang, Shuge.; Li, Xuesi.; Hwang, Kao-Shing. (2018). An adaptive decision-making method with fuzzy bayesian reinforcement learning for robot soccer, Information Sciences, 436, 268-281, 2018.

https://doi.org/10.1016/j.ins.2018.01.032

Justesen, Niels.; Bontrager, Philip.; Togelius, Julian.; Risi, Sebastian. (2019). Deep learning for video game playing, IEEE Transactions on Games, 12(1), 1-20, 2019.

https://doi.org/10.1109/TG.2019.2896986

Din, Adnan Fayyaz Ud.; Mir, Imran.; Gul, Faiza.; Nasar, Al.; Rustom, Mohammad.; Abualigah, Laith. (2022). Reinforced Learning-Based Robust Control Design for Unmanned Aerial Vehicle: Accurate forecasts of vehicle motion, Communications of the ACM, DOI: 10.1007/s13369-022- 06746-0, 1-16, 2022.

https://doi.org/10.1007/s13369-022-06746-0

Fan, Xudong.; Zhang, Xijin.; Wang, Xiaowei.; Yu, Xiong. (2022). A Novel Deep Reinforcement Learning Model for Resilient Road Network Recovery from Multiple Hazards, Journal of Infrastructure Preservation and Resilience, DOI: doi: 10.21203/rs.3.rs-2052084/v1, 2022.

https://doi.org/10.21203/rs.3.rs-2052084/v1

Shi, Haobin.; Shi, Lin.; Xu, Meng.; Hwang, Kao-Shing. (2019). End-to-end navigation strategy with deep reinforcement learning for mobile robots, IEEE Transactions on Industrial Informatics, 16(4), 2393-2402, 2019.

https://doi.org/10.1109/TII.2019.2936167

Lillicrap, Timothy P.; Hunt, Jonathan J.; Pritzel, Alexander.; Heess, Nicolas.; Erez, Tom.; Tassa, Yuval.; Silver, David.; Wierstra, Daan. (2015). Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971, 2015.

Lowe, Ryan.; Wu, Yi I.; Tamar, Aviv.; Harb, Jean.; Pieter Abbeel, OpenAI.; Mordatch, Igor. (2017). Multiagent actor-critic for mixed cooperative-competitive environments, Advances in neural information processing systems, 30, 6379-6390, 2017.

Foerster, Jakob.; Farquhar, Gregory.; Afouras, Triantafyllos.; Nardelli, Nantas.; Whiteson, Shimon. (2018). Counterfactual multi-agent policy gradients, Proceedings of the AAAI conference on artificial intelligence, 32(1), 2974-2982, 2018.

https://doi.org/10.1609/aaai.v32i1.11794

Iqbal, Shariq.; Sha, Fei. (2019). Actor-attention-critic for multi-agent reinforcement learning, International conference on machine learning, 2961-2970, 2019.

Yang, Yaodong.; Luo, Rui.; Li, Minne.; Zhou, Ming.; Zhang, Weinan.; Wang, Jun. (2018). Mean field multi-agent reinforcement learning, International conference on machine learning, 5571- 5580, 2018.

Jiang, Jiechuan and Lu, Zongqing. (2018). Learning attentional communication for multi-agent cooperation, Advances in neural information processing systems, 31(6), 7265-7275, 2018.

Scarselli, Franco.; Gori, Marco.; Tsoi, Ah Chung.; Hagenbuchner, Markus.; Monfardini, Gabriele. (2008). The graph neural network model, IEEE transactions on neural networks, 20(1), 61-80, 2008.

https://doi.org/10.1109/TNN.2008.2005605

Liu, Iou-Jen.; Yeh, Raymond A.; Schwing, Alexander G. (2020). PIC: permutation invariant critic for multi-agent deep reinforcement learning, Conference on Robot Learning, 590-602, 2020.

Ruan, Jingqing.; Du, Yali.; Xiong, Xuantang.; Xing, Dengpeng.; Li, Xiyun.; Meng, Linghui.; Zhang, Haifeng.; Wang, Jun.; Xu, Bo. (2022). GCS: Graph-Based Coordination Strategy for Multi-Agent Reinforcement Learning, International Conference on Autonomous Agents and Multiagent Systems, 1128-1136, 2022.

Sheikh, Hassam Ullah.; Bölöni, Ladislau. (2020). Multi-agent reinforcement learning for problems with combined individual and team reward, International Joint Conference on Neural Networks, 1-8, 2020.

https://doi.org/10.1109/IJCNN48605.2020.9206879

Pan, Wei.; Wang, Nanding.; Xu, Chenxi.; Hwang, Kao-Shing. (2021). A dynamically adaptive approach to reducing strategic interference for multi-agent systems, IEEE Transactions on Cognitive and Developmental Systems, 14(4), 1486-1495, 2022.

https://doi.org/10.1109/TCDS.2021.3110959

Hamilton, Will.; Ying, Zhitao.; Leskovec, Jure. (2017). Inductive representation learning on large graphs, Advances in neural information processing systems, 30, 2017.

Haarnoja, Tuomas.; Zhou, Aurick.; Abbeel, Pieter.; Levine, Sergey. (2020). Off-policy maximum entropy deep reinforcement learning with a stochastic actor, Proceedings of Machine Learning Research, 80, 1861-1870, 2018.

Konda, Vijay and Tsitsiklis, John. (1999). Actor-Critic Algorithms, Advances in neural information processing systems, 12, 1008-1014, 1999.

Van Hasselt, Hado.; Guez, Arthur.; Silver, David. (2016). Deep reinforcement learning with double q-learning, Proceedings of the AAAI conference on artificial intelligence, 30(1), 2159-5399, 2016.

https://doi.org/10.1609/aaai.v30i1.10295

Kipf, Thomas N.; Welling, Max. (2016). Semi-supervised classification with graph convolutional networks Learning, arXiv preprint arXiv:1609.02907, 2016.

Shi, Haobin.; Li, Jingchen.; Mao, Jiahui.; Hwang, Kao-Shing. (2021). Lateral transfer learning for multiagent reinforcement learning, IEEE Transactions on Cybernetics, DOI: 10.1109/TCYB.2021.3108237, 2021.

https://doi.org/10.1109/TCYB.2021.3108237

Li, Jingchen and Shi, Haobin and Hwang, Kao-Shing. (2021). An explainable ensemble feedforward method with Gaussian convolutional filter, Knowledge-Based Systems, 225, 107103.1- 107103.11, 2021.

https://doi.org/10.1016/j.knosys.2021.107103