
Deep Reinforcement Learning for Wireless Communications and Networking
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Comprehensive guide to Deep Reinforcement Learning (DRL) as applied to wireless communication systems
Deep Reinforcement Learning for Wireless Communications and Networking presents an overview of the development of DRL while providing fundamental knowledge about theories, formulation, design, learning models, algorithms and implementation of DRL together with a particular case study to practice. The book also covers diverse applications of DRL to address various problems in wireless networks, such as caching, offloading, resource sharing, and security. The authors discuss open issues by introducing some advanced DRL approaches to address emerging issues in wireless communications and networking.
Covering new advanced models of DRL, e.g., deep dueling architecture and generative adversarial networks, as well as emerging problems considered in wireless networks, e.g., ambient backscatter communication, intelligent reflecting surfaces and edge intelligence, this is the first comprehensive book studying applications of DRL for wireless networks that presents the state-of-the-art research in architecture, protocol, and application design.
Deep Reinforcement Learning for Wireless Communications and Networking covers specific topics such as:
* Deep reinforcement learning models, covering deep learning, deep reinforcement learning, and models of deep reinforcement learning
* Physical layer applications covering signal detection, decoding, and beamforming, power and rate control, and physical-layer security
* Medium access control (MAC) layer applications, covering resource allocation, channel access, and user/cell association
* Network layer applications, covering traffic routing, network classification, and network slicing
With comprehensive coverage of an exciting and noteworthy new technology, Deep Reinforcement Learning for Wireless Communications and Networking is an essential learning resource for researchers and communications engineers, along with developers and entrepreneurs in autonomous systems, who wish to harness this technology in practical applications.
More details
Other editions
Additional editions

Persons
Dinh Thai Hoang, Ph.D., is a faculty member at the University of Technology Sydney, Australia. He is also an Associate Editor of IEEE Communications Surveys & Tutorials and an Editor of IEEE Transactions on Wireless Communications, IEEE Transactions on Cognitive Communications and Networking, and IEEE Transactions on Vehicular Technology.
Nguyen Van Huynh, Ph.D., obtained his Ph.D. from the University of Technology Sydney in 2022. He is currently a Research Associate in the Department of Electrical and Electronic Engineering, Imperial College London, UK.
Diep N. Nguyen, Ph.D., is Director of Agile Communications and Computing Group and a member of the Faculty of Engineering and Information Technology at the University of Technology Sydney, Australia.
Ekram Hossain, Ph.D., is a Professor in the Department of Electrical and Computer Engineering at the University of Manitoba, Canada, and a Fellow of the IEEE. He co-authored the Wiley title Radio Resource Management in Multi-Tier Cellular Wireless Networks (2013).
Dusit Niyato, Ph.D., is a Professor in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. He co-authored the Wiley title Radio Resource Management in Multi-Tier Cellular Wireless Networks (2013).
Content
Notes on Contributors xiii
Foreword xiv
Preface xv
Acknowledgments xviii
Acronyms xix
Introduction xxii
Part I Fundamentals of Deep Reinforcement Learning 1
1 Deep Reinforcement Learning and Its Applications 3
1.1 Wireless Networks and Emerging Challenges 3
1.2 Machine Learning Techniques and Development of DRL 4
1.2.1 Machine Learning 4
1.2.2 Artificial Neural Network 7
1.2.3 Convolutional Neural Network 8
1.2.4 Recurrent Neural Network 9
1.2.5 Development of Deep Reinforcement Learning 10
1.3 Potentials and Applications of DRL 11
1.3.1 Benefits of DRL in Human Lives 11
1.3.2 Features and Advantages of DRL Techniques 12
1.3.3 Academic Research Activities 12
1.3.4 Applications of DRL Techniques 13
1.3.5 Applications of DRL Techniques in Wireless Networks 15
1.4 Structure of this Book and Target Readership 16
1.4.1 Motivations and Structure of this Book 16
1.4.2 Target Readership 19
1.5 Chapter Summary 20
References 21
2 Markov Decision Process and Reinforcement Learning 25
2.1 Markov Decision Process 25
2.2 Partially Observable Markov Decision Process 26
2.3 Policy and Value Functions 29
2.4 Bellman Equations 30
2.5 Solutions of MDP Problems 31
2.5.1 Dynamic Programming 31
2.5.1.1 Policy Evaluation 31
2.5.1.2 Policy Improvement 31
2.5.1.3 Policy Iteration 31
2.5.2 Monte Carlo Sampling 32
2.6 Reinforcement Learning 33
2.7 Chapter Summary 35
References 35
3 Deep Reinforcement Learning Models and Techniques 37
3.1 Value-Based DRL Methods 37
3.1.1 Deep Q-Network 38
3.1.2 Double DQN 41
3.1.3 Prioritized Experience Replay 42
3.1.4 Dueling Network 44
3.2 Policy-Gradient Methods 45
3.2.1 REINFORCE Algorithm 46
3.2.1.1 Policy Gradient Estimation 46
3.2.1.2 Reducing the Variance 48
3.2.1.3 Policy Gradient Theorem 50
3.2.2 Actor-Critic Methods 51
3.2.3 Advantage of Actor-Critic Methods 52
3.2.3.1 Advantage of Actor-Critic (A2C) 53
3.2.3.2 Asynchronous Advantage Actor-Critic (A3C) 55
3.2.3.3 Generalized Advantage Estimate (GAE) 57
3.3 Deterministic Policy Gradient (DPG) 59
3.3.1 Deterministic Policy Gradient Theorem 59
3.3.2 Deep Deterministic Policy Gradient (DDPG) 61
3.3.3 Distributed Distributional DDPG (D4PG) 63
3.4 Natural Gradients 63
3.4.1 Principle of Natural Gradients 64
3.4.2 Trust Region Policy Optimization (TRPO) 67
3.4.2.1 Trust Region 69
3.4.2.2 Sample-Based Formulation 70
3.4.2.3 Practical Implementation 70
3.4.3 Proximal Policy Optimization (PPO) 72
3.5 Model-Based RL 74
3.5.1 Vanilla Model-Based RL 75
3.5.2 Robust Model-Based RL: Model-Ensemble TRPO (ME-TRPO) 76
3.5.3 Adaptive Model-Based RL: Model-Based Meta-Policy Optimization (mb-mpo) 77
3.6 Chapter Summary 78
References 79
4 A Case Study and Detailed Implementation 83
4.1 System Model and Problem Formulation 83
4.1.1 System Model and Assumptions 84
4.1.1.1 Jamming Model 84
4.1.1.2 System Operation 85
4.1.2 Problem Formulation 86
4.1.2.1 State Space 86
4.1.2.2 Action Space 87
4.1.2.3 Immediate Reward 88
4.1.2.4 Optimization Formulation 88
4.2 Implementation and Environment Settings 89
4.2.1 Install TensorFlow with Anaconda 89
4.2.2 Q-Learning 90
4.2.2.1 Codes for the Environment 91
4.2.2.2 Codes for the Agent 96
4.2.3 Deep Q-Learning 97
4.3 Simulation Results and Performance Analysis 102
4.4 Chapter Summary 106
References 106
Part II Applications of Drl in Wireless Communications and Networking 109
5 DRL at the Physical Layer 111
5.1 Beamforming, Signal Detection, and Decoding 111
5.1.1 Beamforming 111
5.1.1.1 Beamforming Optimization Problem 111
5.1.1.2 DRL-Based Beamforming 113
5.1.2 Signal Detection and Channel Estimation 118
5.1.2.1 Signal Detection and Channel Estimation Problem 118
5.1.2.2 RL-Based Approaches 120
5.1.3 Channel Decoding 122
5.2 Power and Rate Control 123
5.2.1 Power and Rate Control Problem 123
5.2.2 DRL-Based Power and Rate Control 124
5.3 Physical-Layer Security 128
5.4 Chapter Summary 129
References 131
6 DRL at the MAC Layer 137
6.1 Resource Management and Optimization 137
6.2 Channel Access Control 139
6.2.1 DRL in the IEEE 802.11 MAC 141
6.2.2 MAC for Massive Access in IoT 143
6.2.3 MAC for 5G and B5G Cellular Systems 147
6.3 Heterogeneous MAC Protocols 155
6.4 Chapter Summary 158
References 158
7 DRL at the Network Layer 163
7.1 Traffic Routing 163
7.2 Network Slicing 166
7.2.1 Network Slicing-Based Architecture 166
7.2.2 Applications of DRL in Network Slicing 168
7.3 Network Intrusion Detection 179
7.3.1 Host-Based IDS 180
7.3.2 Network-Based IDS 181
7.4 Chapter Summary 183
References 183
8 DRL at the Application and Service Layer 187
8.1 Content Caching 187
8.1.1 QoS-Aware Caching 187
8.1.2 Joint Caching and Transmission Control 189
8.1.3 Joint Caching, Networking, and Computation 191
8.2 Data and Computation Offloading 193
8.3 Data Processing and Analytics 198
8.3.1 Data Organization 198
8.3.1.1 Data Partitioning 198
8.3.1.2 Data Compression 199
8.3.2 Data Scheduling 200
8.3.3 Tuning of Data Processing Systems 201
8.3.4 Data Indexing 202
8.3.4.1 Database Index Selection 202
8.3.4.2 Index Structure Construction 203
8.3.5 Query Optimization 205
8.4 Chapter Summary 206
References 207
Part III Challenges, Approaches, Open Issues, and Emerging Research Topics 213
9 DRL Challenges in Wireless Networks 215
9.1 Adversarial Attacks on DRL 215
9.1.1 Attacks Perturbing the State space 215
9.1.1.1 Manipulation of Observations 216
9.1.1.2 Manipulation of Training Data 218
9.1.2 Attacks Perturbing the Reward Function 220
9.1.3 Attacks Perturbing the Action Space 222
9.2 Multiagent DRL in Dynamic Environments 223
9.2.1 Motivations 223
9.2.2 Multiagent Reinforcement Learning Models 224
9.2.2.1 Markov/Stochastic Games 225
9.2.2.2 Decentralized Partially Observable Markov Decision Process (dpomdp) 226
9.2.3 Applications of Multiagent DRL in Wireless Networks 227
9.2.4 Challenges of Using Multiagent DRL in Wireless Networks 229
9.2.4.1 Nonstationarity Issue 229
9.2.4.2 Partial Observability Issue 229
9.3 Other Challenges 230
9.3.1 Inherent Problems of Using RL in Real-Word Systems 230
9.3.1.1 Limited Learning Samples 230
9.3.1.2 System Delays 230
9.3.1.3 High-Dimensional State and Action Spaces 231
9.3.1.4 System and Environment Constraints 231
9.3.1.5 Partial Observability and Nonstationarity 231
9.3.1.6 Multiobjective Reward Functions 232
9.3.2 Inherent Problems of DL and Beyond 232
9.3.2.1 Inherent Problems of dl 232
9.3.2.2 Challenges of DRL Beyond Deep Learning 233
9.3.3 Implementation of DL Models in Wireless Devices 236
9.4 Chapter Summary 237
References 237
10 DRL and Emerging Topics in Wireless Networks 241
10.1 DRL for Emerging Problems in Future Wireless Networks 241
10.1.1 Joint Radar and Data Communications 241
10.1.2 Ambient Backscatter Communications 244
10.1.3 Reconfigurable Intelligent Surface-Aided Communications 247
10.1.4 Rate Splitting Communications 249
10.2 Advanced DRL Models 252
10.2.1 Deep Reinforcement Transfer Learning 252
10.2.1.1 Reward Shaping 253
10.2.1.2 Intertask Mapping 254
10.2.1.3 Learning from Demonstrations 255
10.2.1.4 Policy Transfer 255
10.2.1.5 Reusing Representations 256
10.2.2 Generative Adversarial Network (GAN) for DRL 257
10.2.3 Meta Reinforcement Learning 258
10.3 Chapter Summary 259
References 259
Index 263
1
Deep Reinforcement Learning and Its Applications
1.1 Wireless Networks and Emerging Challenges
Over the past few years, communication technologies have been rapidly developing to support various aspects of our daily lives, from smart cities and healthcare to logistics and transportation. This will be the backbone for the future's data-centric society. Nevertheless, these new applications generate a tremendous amount of workload and require high-reliability and ultrahigh-capacity wireless communications. In the latest report [1], Cisco projected the number of connected devices that will be around 29.3 billion by 2023, with more than 45% equipped with mobile connections. The fastest-growing mobile connection type is likely machine-to-machine (M2M), as Internet-of-Things (IoT) services play a significant role in consumer and business environments. This poses several challenges in future wireless communication systems:
- Emerging services (e.g. augmented reality [AR] and virtual reality [VR]) require high-reliability and ultrahigh capacity wireless communications. However, existing communication systems, designed and optimized based on conventional communication theories, significantly prevent further performance improvements for these services.
- Wireless networks are becoming increasingly ad hoc and decentralized, in which mobile devices and sensors are required to make independent actions such as channel selections and base station associations to meet the system's requirements, e.g. energy efficiency and throughput maximization. Nonetheless, the dynamics and uncertainty of the systems prevent them from obtaining optimal decisions.
- Another crucial component of future network systems is network traffic control. Network control can dramatically improve resource usage and the efficiency of information transmission through monitoring, checking, and controlling data flows. Unfortunately, the proliferation of smart IoT devices and ultradense radio networks has greatly expanded the network size with extremely dynamic topologies. In addition, the explosive growing data traffic imposes considerable pressure on Internet management. As a result, existing network control approaches may not effectively handle these complex and dynamic networks.
- Mobile edge computing (MEC) has been recently proposed to provide computing and caching capabilities at the edge of cellular networks. In this way, popular contents can be cached at the network edge, such as base station, end-user devices, and gateways to avoid duplicate transmissions of the same content, resulting in better energy and spectrum usage [2, 3]. One major challenge in future communication systems is the straggling problems at both edge nodes and wireless links, which can significantly increase the computation delay of the system. Additionally, the huge data demands of mobile users and the limited storage and processing capacities are critical issues that need to be addressed.
Conventional approaches to addressing the new challenges and demands of modern communication systems have several limitations. First, the rapid growth in the number of devices, the expansion of network scale, and the diversity of services in the new era of communications are expected to significantly increase the amount of data generated by applications, users, and networks [1]. However, traditional solutions may be unable to process and utilize this data effectively to improve system performance. Second, existing algorithms are not well-suited to handle the dynamic and uncertain nature of network environments, resulting in poor performance [4]. Finally, traditional optimization solutions often require complete information about the system to be effective, but this information may not be readily available in practice, limiting the applicability of these approaches. Deep reinforcement learning (DRL) has the potential to overcome these limitations and provide promising solutions to these challenges.
DRL leverages the benefits of deep neural networks (DNNs), which have proven effective in tackling complex, large-scale engines, speech recognition, medical diagnosis, and computer vision. This makes DRL well suited for managing the increasing complexity and scale of future communication networks. Additionally, DRL's online deployment allows it to effectively handle the dynamics and unpredictable nature of wireless communication environments.
1.2 Machine Learning Techniques and Development of DRL
1.2.1 Machine Learning
Machine learning (ML) is a problem-solving paradigm where a machine learns a particular task (e.g. image classification, document text classification, speech recognition, medical diagnosis, robot control, and resource allocation in communication networks) and performance metric (e.g. classification accuracy and performance loss) using experiences or data [5]. The task generally involves a function that maps well-defined inputs to well-defined outputs. The essence of data-driven ML is that there is a pattern in the task inputs and the outcome which cannot be pinned down mathematically. Thus, the solution to the task, which may involve making a decision or predicting an output, cannot be programmed explicitly. If the set of rules connecting the task inputs and output(s) were known, a program could be written based on those rules (e.g. if-then-else codes) to solve the problem. Instead, an ML algorithm learns from the input data set, which specifies the correct output for a given input; that is, an ML method will result in a program that uses the data samples to solve the problem. A data-driven ML architecture for the classification problem is shown in Figure 1.1. The training module is responsible for optimizing the classifier from the training data samples and providing the classification module with a trained classifier. The classification module determines the output based on the input data. The training and classification modules can work independently. The training procedure generally takes a long time. However, the training module is activated only periodically. Also, the training procedure can be performed in the background, while the classification module operates as usual.
Figure 1.1 A data-driven ML architecture.
There are three categories of ML techniques, including supervised, unsupervised, and reinforcement learning.
- Supervised learning: Given a data set , a supervised learning algorithm predicts that generalizes the input-output mapping in to inputs outside . Here, is the -dimensional feature space , is the input vector of the th sample, is the label of the th sample, and is the label space. For binary classification problems (e.g. spam filtering), or . For multiclass classification (e.g. face classification), . On the other hand, for regression problems (e.g. predicting temperature), . The data points are drawn from a (unknown) distribution . The learning process involves learning a function such that for a new pair , we have with high probability (or ). A loss function (or risk function), such as the mean squared error function, evaluates the error between the predicted probabilities/values returned by the function and the labels on the training data.
For supervised learning, the data set is usually split into three subsets: as the training data, as the validation data, and as the test data. The function is validated on : if the loss is too significant, will be revised based on and validated again on . This process will keep going back and forth until it gives a low loss on . The standard supervised learning techniques include the following: Bayesian classification, logistic regression, -nearest neighbor (KNN), neural network (NN), support vector machine (SVM), decision tree (DT) classification, and recommender system. Note that supervised learning techniques require the availability of labeled data sets.
- Unsupervised learning techniques are used to create an internal representation of the input, e.g. to form clusters, extract features, reduce dimensionality, estimate density. Unlike supervised learning, these techniques can deal with unlabeled data sets.
- Reinforcement learning (RL) techniques do not require a prior dataset. With RL, an agent learns from interactions with an external environment. The idea of learning by interacting with a domain is an imitation of humans' natural learning process. For example, at the point when a newborn child plays, e.g. waves his arms or kicks a ball, his/her brain has a direct sensorimotor connection with its surroundings. Repeating this process produces essential information about the impact of actions, causes and effects, and what to do to reach the goals.
Deep learning (DL), a subset of ML, has gained popularity thanks to its DNN architectures to overcome the limitations of ML. DL models are able to extract the key features of data without relying on the data's structure. The "deep" in deep learning refers to the number of layers in the DNN architecture, with more layers leading to a deeper network. DL has been successfully applied in various fields, including face and voice recognition, text translation, and intelligent driver assistance systems. It has several advantages over traditional algorithms as follows...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.