TY - JOUR
T1 - Distributed Learning for Optimal Spectrum Access in Dense Device-To-Device Ad-Hoc Networks
AU - Boyarski, Tomer
AU - Wang, Wenbo
AU - Leshem, Amir
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - In 5G networks, Device-To-Device (D2D) communications aim to provide dense coverage without relying on the cellular network infrastructure. To achieve this goal, the D2D links are expected to be capable of self-organizing and allocating finite, interfering resources with limited inter-link coordination. We consider a dense ad-hoc D2D network and propose a decentralized time-frequency allocation mechanism that achieves sub-linear social regret toward optimal spectrum efficiency. The proposed mechanism is constructed in the framework of multi-Agent multi-Armed bandits, which employs the carrier-sensing-based distributed auction to learn the optimal allocation of time-frequency blocks with different channel state dynamics from scratch. Our theoretical analysis shows that the proposed fully distributed mechanism achieves a logarithmic regret bound by adopting an epoch-based strategy-learning scheme when the length of the strategy-exploitation window is exponentially growing. We further propose an implementation-friendly protocol featuring a fixed exploitation window, which guarantees a good tradeoff between performance optimality and protocol efficiency. Numerical simulations demonstrate that the proposed protocol achieves higher efficiency than the prevalent reference algorithms in both static and dynamic wireless environments.
AB - In 5G networks, Device-To-Device (D2D) communications aim to provide dense coverage without relying on the cellular network infrastructure. To achieve this goal, the D2D links are expected to be capable of self-organizing and allocating finite, interfering resources with limited inter-link coordination. We consider a dense ad-hoc D2D network and propose a decentralized time-frequency allocation mechanism that achieves sub-linear social regret toward optimal spectrum efficiency. The proposed mechanism is constructed in the framework of multi-Agent multi-Armed bandits, which employs the carrier-sensing-based distributed auction to learn the optimal allocation of time-frequency blocks with different channel state dynamics from scratch. Our theoretical analysis shows that the proposed fully distributed mechanism achieves a logarithmic regret bound by adopting an epoch-based strategy-learning scheme when the length of the strategy-exploitation window is exponentially growing. We further propose an implementation-friendly protocol featuring a fixed exploitation window, which guarantees a good tradeoff between performance optimality and protocol efficiency. Numerical simulations demonstrate that the proposed protocol achieves higher efficiency than the prevalent reference algorithms in both static and dynamic wireless environments.
KW - D2D networks
KW - Multi-Agent multi-Armed bandit
KW - distributed network management
KW - resource allocation
UR - https://www.scopus.com/pages/publications/85166771345
U2 - 10.1109/tsp.2023.3300630
DO - 10.1109/tsp.2023.3300630
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85166771345
SN - 1053-587X
VL - 71
SP - 3149
EP - 3163
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
ER -