A Resource Allocation Algorithm for Ultra-Dense Networks Based on Deep Reinforcement Learning


  • Huashuai Zhang
  • Tingmei Wang Beijing Union University
  • Haiwei Shen


ultra-dense networks (UDNs), deep reinforcement learning (DRL), resource allocation, throughput, energy efficiency


The resource optimization of ultra-dense networks (UDNs) is critical to meet the huge demand of users for wireless data traffic. But the mainstream optimization algorithms have many problems, such as the poor optimization effect, and high computing load. This paper puts forward a wireless resource allocation algorithm based on deep reinforcement learning (DRL), which aims to maximize the total throughput of the entire network and transform the resource allocation problem into a deep Q-learning process. To effectively allocate resources in UDNs, the DRL algorithm was introduced to improve the allocation efficiency of wireless resources; the authors adopted the resource allocation strategy of the deep Q-network (DQN), and employed empirical repetition and target network to overcome the instability and divergence of the results caused by the previous network state, and to solve the overestimation of the Q value. Simulation results show that the proposed algorithm can maximize the total throughput of the network, while making the network more energy-efficient and stable. Thus, it is very meaningful to introduce the DRL to the research of UDN resource allocation.


[1] Abtahi, F.; Zhu, Z.; Burry, A.M. (2015). A deep reinforcement learning approach to character segmentation of license plate images, In 2015 14th IAPR international conference on machine vision applications (MVA), 539-542, 2015. https://doi.org/10.1109/MVA.2015.7153249

[2] Abuzainab, N.; Saad, W.; MacKenzie, A.B. (2019). Distributed uplink power control in an ultradense millimeter wave network: A mean-field game approach, IEEE Wireless Communications Letters, 8(5), 1328-1332, 2019. https://doi.org/10.1109/LWC.2019.2916066

[3] Amiri, R.; Mehrpouyan, H.; Fridman, L.; Mallik, R.K.; Nallanathan, A.; Matolak, D. (2018). A machine learning approach for power allocation in HetNets considering QoS, In 2018 IEEE International Conference on Communications (ICC), 1-7, 2018. https://doi.org/10.1109/ICC.2018.8422864

[4] Bai, C.J.; Liu, P.; Zhao, W.; Tang, X.L. (2019). Active sampling method for deep Q learning based on TD-error adaptive correction, Journal of Computer Research and Development, 56(2), 38-56, 2019.

[5] Chang, Y.; Fu, F.; Zhang, Z.C. (2020). Research on resource allocation based on reinforcement learning in wireless networks, Journal of Test and Measurement Technology, 34(2), 152-158, 2020.

[6] Chen, M.; Hua, Y.; Gu, X.; Nie, S.; Fan, Z. (2016). A self-organizing resource allocation strategy based on Q-learning approach in ultra-dense networks, In 2016 IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC), 155-160, 2016. https://doi.org/10.1109/ICNIDC.2016.7974555

[7] Deng, L.; Yu, D. (2014). Deep learning: Methods and applications, Foundations and Trends In signal Processing, 7(3-4), 197-387, 2014. https://doi.org/10.1561/2000000039

[8] Ge, X.; Tu, S.; Mao, G.; Wang, C.X.; Han, T. (2016). 5G ultra-dense cellular networks, IEEE Wireless Communications, 23(1), 72-79, 2016. https://doi.org/10.1109/MWC.2016.7422408

[9] Goldsmith, A. (2005). Wireless communications, Cambridge: Cambridge Univ. Press, 477-480, 2005. https://doi.org/10.1017/CBO9780511841224

[10] Hasselt, H.; Guez, A.; Silver, D. (2015). Deep reinforcement learning with double Q-learning, Computer Science, 14(8), 367-375, 2015.

[11] He, Y.; Zhang, Z.; Yu, F.R.; Zhao, N.; Yin, H.; Leung, V.C.; Zhang, Y. (2017). Deepreinforcement- learning-based optimization for cache-enabled opportunistic interference alignment wireless networks, IEEE Transactions on Vehicular Technology, 66(11), 10433-10445, 2017. https://doi.org/10.1109/TVT.2017.2751641

[12] Hinton, G.E.; Salakhutdinov, R.R. (2006). Reducing the dimensionality of data with neural networks, Science, 313(5786), 504-507, 2006. https://doi.org/10.1126/science.1127647

[13] Hui, Q.L. (2020). Multi cell power allocation algorithm based on deep reinforcement learning, Technology and Market, 27(10), 11-14, 2020.

[14] Khatib, O. (1986). Real-time avoidance for manipulator and mobile robot, The International Journal of Robotic Research edition, 5(1), 90-98, 1986. https://doi.org/10.1177/027836498600500106

[15] Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. (2016). End-to-end training of deep visuomotor policies, The Journal of Machine Learning Research, 17(1), 1334-1373, 2016.

[16] Liao, X.M.; Yan, S.H.; Shi, J.; Tan, Z.Y.; Zhao, Z.L.; Li, Z. (2019). Deep reinforcement learning based resource allocation algorithm in cellular networks, Journal on Communications, 40(2), 15-22, 2019.

[17] Liu, H.Y. (2016). Research on distributed radio resource management in 5G oriented ultra dense networks, Beijing Jiaotong University, 2016.

[18] Lozano-PĂ©rez, T.; Wesley, M.A. (1979). An algorithm for planning collision-free paths among polyhedral obstacles, Communications of the ACM, 22(10), 560-570, 1979. https://doi.org/10.1145/359156.359164

[19] Maddumala, V.R., Arunkumar, R. (2020). Big data-driven feature extraction and clustering based on statistical methods, Traitement du Signal, 37(3), 387-394, 2020. https://doi.org/10.18280/ts.370305

[20] Mitchell, M.; Holland, J.H. (1993). When will a genetic algorithm outperform hill-climbing? Advances in Neural Information Process System, 9(4), 120-136, 1993.

[21] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; Hassabis, D. (2015). Human-level control through deep reinforcement learning, Nature, 518(7540), 529-533, 2015. https://doi.org/10.1038/nature14236

[22] Nie, J.; Haykin, S. (1999). A Q-learning-based dynamic channel assignment technique for mobile communication systems, IEEE Transactions on Vehicular Technology, 48(5), 1676-1687, 1999. https://doi.org/10.1109/25.790549

[23] Saad, H.; Mohamed, A.; ElBatt, T. (2014). A cooperative Q-learning approach for distributed resource allocation in multi-user femtocell networks, In 2014 IEEE Wireless Communications and Networking Conference (WCNC), 1490-1495, 2014. https://doi.org/10.1109/WCNC.2014.6952410

[24] Sutton, R.S.; Barto, A.G. (1998). Reinforcement learning: An introduction, Cambridge: MIT Press, 47-68, 1998.

[25] Tan, C.W.; Palomar, D.P.; Chiang, M. (2005). Solving nonconvex power control problems in wireless networks: Low SIR regime and distributed algorithms, In GLOBECOM'05. IEEE Global Telecommunications Conference, 2005.

[26] Tang, L.; Wei Y.N.; Ma R.L.; He, X.Y.; Chen, Q.B. (2019). Online learning-based virtual resource allocation for network slicing in virtualized cloud radio access network, Journal of Electronics & Information Technology, 41(7), 1533-1539, 2019. https://doi.org/10.1109/ACCESS.2019.2940435

[27] Teng, Y.; Liu, M.; Yu, F.R.; Leung, V.C.; Song, M.; Zhang, Y. (2018). Resource allocation for ultra-dense networks: A survey, some research issues and challenges, IEEE Communications Surveys & Tutorials, 21(3), 2134-2168, 2018. https://doi.org/10.1109/COMST.2018.2867268

[28] Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. (2016). Dueling network architectures for deep reinforcement learning, In International Conference on Machine Learning, 1995-2003, 2016.

[29] Wang, X.; Liu, B.; Su, X. (2018). A power allocation scheme using non-cooperative game theory in ultra-dense networks, In 2018 27th Wireless and Optical Communication Conference (WOCC), 1-5, 2018. https://doi.org/10.1109/WOCC.2018.8372694

[30] Wang, X.; Liu, B.; Su, X. (2018). A power allocation scheme using non-cooperative game theory in ultra-dense networks, In 2018 27th Wireless and Optical Communication Conference (WOCC), 1-5, 2018. https://doi.org/10.1109/WOCC.2018.8372694

[31] Wang, H.D. (2020). A synchronous transmission method for array signals of sensor network under resonance technology, Traitement du Signal, 37(4), 579-584, 2020. https://doi.org/10.18280/ts.370405

[32] Xiong, K. (2019). Research on resource allocation of wireless virtual network based on deep reinforcement learning, University of Electronic Science and Technology, 2019.

[33] Yoon, J.; Arslan, M. Y.; Sundaresan, K.; Krishnamurthy, S.V.; Banerjee, S. (2018). Characterization of interference in OFDMA small-cell networks, IEEE Transactions on Vehicular Technology, 67(9), 7937-7954, 2018. https://doi.org/10.1109/TVT.2018.2839692

[34] Zhang, G.; Zhang, H. (2008). Adapative resource allocation for downlink OFDMA networks using cooperative game theory, In 2008 11th IEEE Singapore International Conference on Communication Systems, 98-103, 2008.

[35] Zhang, G.; Yang, K.; Chen, H.H. (2012). Resource allocation for wireless cooperative networks: A unified cooperative bargaining game theoretic framework, IEEE Wireless Communications, 19(2), 38-43, 2012. https://doi.org/10.1109/MWC.2012.6189411

[36] Zhang, H.; Jiang, C.; Beaulieu, N.C.; Chu, X.; Wang, X.; Quek, T.Q. (2015). Resource allocation for cognitive small cell networks: A cooperative bargaining game theoretic approach, IEEE Transactions on Wireless Communications, 14(6), 3481-3493, 2015. https://doi.org/10.1109/TWC.2015.2407355

[37] Zhang, H.; Liu, H.; Cheng, J.; Leung, V.C. (2017). Downlink energy efficiency of power allocation and wireless backhaul bandwidth allocation in heterogeneous small cell networks, IEEE Transactions on Communications, 66(4), 1705-1716, 2017. https://doi.org/10.1109/TCOMM.2017.2763623

[38] Zhao, W.C.; Wu, J.Q. (2018). Study on the game control based on prior experience replay algorithm, Journal of Gansu Sciences, 30(2), 15-19, 2018.

[39] Zia, K.; Javed, N.; Sial, M.N.; Ahmed, S.; Pirzada, A.A.; Pervez, F. (2019). A distributed multiagent RL-based autonomous spectrum allocation scheme in D2D enabled multi-tier HetNets, IEEE Access, 7, 6733-6745, 2019. https://doi.org/10.1109/ACCESS.2018.2890210

[40] Zou, Y.; Xing, Q.Z.; Wang, B.C.; Zheng, S.X.; Cheng, C.; Wang, Z.M.; Wang, X.W. (2019). Application of the asynchronous advantage actor-critic machine learning algorithm to real-time accelerator tuning, Nuclear Science and Techniques, 30(10), 1-9, 2019. https://doi.org/10.1007/s41365-019-0668-1

Additional Files



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.