
Accelerators for Convolutional Neural Networks
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Comprehensive and thorough resource exploring different types of convolutional neural networks and complementary accelerators
Accelerators for Convolutional Neural Networks provides basic deep learning knowledge and instructive content to build up convolutional neural network (CNN) accelerators for the Internet of things (IoT) and edge computing practitioners, elucidating compressive coding for CNNs, presenting a two-step lossless input feature maps compression method, discussing arithmetic coding -based lossless weights compression method and the design of an associated decoding method, describing contemporary sparse CNNs that consider sparsity in both weights and activation maps, and discussing hardware/software co-design and co-scheduling techniques that can lead to better optimization and utilization of the available hardware resources for CNN acceleration.
The first part of the book provides an overview of CNNs along with the composition and parameters of different contemporary CNN models. Later chapters focus on compressive coding for CNNs and the design of dense CNN accelerators. The book also provides directions for future research and development for CNN accelerators.
Other sample topics covered in Accelerators for Convolutional Neural Networks include:
* How to apply arithmetic coding and decoding with range scaling for lossless weight compression for 5-bit CNN weights to deploy CNNs in extremely resource-constrained systems
* State-of-the-art research surrounding dense CNN accelerators, which are mostly based on systolic arrays or parallel multiply-accumulate (MAC) arrays
* iMAC dense CNN accelerator, which combines image-to-column (im2col) and general matrix multiplication (GEMM) hardware acceleration
* Multi-threaded, low-cost, log-based processing element (PE) core, instances of which are stacked in a spatial grid to engender NeuroMAX dense accelerator
* Sparse-PE, a multi-threaded and flexible CNN PE core that exploits sparsity in both weights and activation maps, instances of which can be stacked in a spatial grid for engendering sparse CNN accelerators
For researchers in AI, computer vision, computer architecture, and embedded systems, along with graduate and senior undergraduate students in related programs of study, Accelerators for Convolutional Neural Networks is an essential resource to understanding the many facets of the subject and relevant applications.
More details
Other editions
Additional editions


Persons
ARSLAN MUNIR, PhD, is an Associate Professor in the Department of Computer Science of Kansas State University. He is also the Director of the Intelligent Systems, Computer Architecture, Analytics, and Security (ISCAAS) Laboratory at the university.
JOONHO KONG, PhD, is an Associate Professor in the School of Electronics Engineering College of IT Engineering at Kyungpook National University, South Korea.
MAHMOOD AZHAR QURESHI, PhD, is a Senior IP Logic Design Engineer at Intel Corporation in Santa Clara, California.
Content
About the Authors xiii
Preface xv
Part I Overview 1
1 Introduction 3
1.1 History and Applications 5
1.2 Pitfalls of High-Accuracy DNNs/CNNs 6
1.2.1 Compute and Energy Bottleneck 6
1.2.2 Sparsity Considerations 9
1.3 Chapter Summary 11
2 Overview of Convolutional Neural Networks 13
2.1 Deep Neural Network Architecture 13
2.2 Convolutional Neural Network Architecture 15
2.3 Popular CNN Models 26
2.4 Popular CNN Datasets 30
2.5 CNN Processing Hardware 31
2.6 Chapter Summary 37
Part II Compressive Coding for CNNs 39
3 Contemporary Advances in Compressive Coding for CNNs 41
3.1 Background of Compressive Coding 41
3.2 Compressive Coding for CNNs 43
3.3 Lossy Compression for CNNs 43
3.4 Lossless Compression for CNNs 44
3.5 Recent Advancements in Compressive Coding for CNNs 48
3.6 Chapter Summary 50
4 Lossless Input Feature Map Compression 51
4.1 Two-Step Input Feature Map Compression Technique 52
4.2 Evaluation 55
4.3 Chapter Summary 57
5 Arithmetic Coding and Decoding for 5-Bit CNN Weights 59
5.1 Architecture and Design Overview 60
5.2 Algorithm Overview 63
5.3 Weight Decoding Algorithm 67
5.4 Encoding and Decoding Examples 69
5.5 Evaluation Methodology 74
5.6 Evaluation Results 75
5.7 Chapter Summary 84
Part III Dense CNN Accelerators 85
6 Contemporary Dense CNN Accelerators 87
6.1 Background on Dense CNN Accelerators 87
6.2 Representation of the CNNWeights and Feature Maps in Dense Format 87
6.3 Popular Architectures for Dense CNN Accelerators 89
6.4 Recent Advancements in Dense CNN Accelerators 92
6.5 Chapter Summary 93
7 iMAC: Image-to-Column and General Matrix Multiplication-Based Dense CNN Accelerator 95
7.1 Background and Motivation 95
7.2 Architecture 97
7.3 Implementation 99
7.4 Chapter Summary 100
8 NeuroMAX: A Dense CNN Accelerator 101
8.1 RelatedWork 102
8.2 Log Mapping 103
8.3 Hardware Architecture 105
8.4 Data Flow and Processing 108
8.5 Implementation and Results 118
8.6 Chapter Summary 124
Part IV Sparse CNN Accelerators 125
9 Contemporary Sparse CNN Accelerators 127
9.1 Background of Sparsity in CNN Models 127
9.2 Background of Sparse CNN Accelerators 128
9.3 Recent Advancements in Sparse CNN Accelerators 131
9.4 Chapter Summary 133
10 CNN Accelerator for In Situ Decompression and Convolution of Sparse Input Feature Maps 135
10.1 Overview 135
10.2 Hardware Design Overview 135
10.3 Design Optimization Techniques Utilized in the Hardware Accelerator 140
10.4 FPGA Implementation 141
10.5 Evaluation Results 143
10.6 Chapter Summary 149
11 Sparse-PE: A Sparse CNN Accelerator 151
11.1 RelatedWork 155
11.2 Sparse-PE 156
11.3 Implementation and Results 174
11.4 Chapter Summary 184
12 Phantom: A High-Performance Computational Core for Sparse CNNs 185
12.1 RelatedWork 189
12.2 Phantom 190
12.3 Phantom-2D 201
12.4 Experiments and Results 209
12.5 Chapter Summary 218
Part V HW/SW Co-Design and Co-Scheduling for CNN Acceleration 221
13 State-of-the-Art in HW/SW Co-Design and Co-Scheduling for CNN Acceleration 223
13.1 HW/SW Co-Design 223
13.2 HW/SW Co-Scheduling 228
13.3 Chapter Summary 230
14 Hardware/Software Co-Design for CNN Acceleration 231
14.1 Background of iMAC Accelerator 231
14.2 Software Partition for iMAC Accelerator 232
14.3 Experimental Evaluations 235
14.4 Chapter Summary 237
15 CPU-Accelerator Co-Scheduling for CNN Acceleration 239
15.1 Background and Preliminaries 240
15.2 CNN Acceleration with CPU-Accelerator Co-Scheduling 242
15.3 Experimental Results 251
15.4 Chapter Summary 257
16 Conclusions 259
References 265
Index 285
1
Introduction
Deep neural networks (DNNs) have enabled the deployment of artificial intelligence (AI) in many modern applications including autonomous driving [1], image recognition [2], and speech processing [3]. In many applications, DNNs have achieved close to human-level accuracy and, in some, they have exceeded human accuracy [4]. This high accuracy comes from a DNN's unique ability to automatically extract high-level features from a huge quantity of training data using statistical learning and improvement over time. This learning over time provides a DNN with an effective representation of the input space. This is quite different from the earlier approaches where specific features were hand-crafted by domain experts and were subsequently used for feature extraction.
Convolutional neural networks (CNNs) are a type of DNNs, which are most commonly used for computer vision tasks. Among different types of DNNs, such as multilayer perceptrons (MLP), recurrent neural networks (RNNs), long short-term memory (LSTM) networks, radial basis function networks (RBFNs), generative adversarial networks (GANs), restricted Boltzmann machines (RBMs), deep belief networks (DBNs), and autoencoders, CNNs are the mostly commonly used. Invention of CNNs has revolutionized the field of computer vision and has enabled many applications of computer vision to go mainstream. CNNs have applications in image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, object detection, activity recognition, natural language processing, brain-computer interfaces, and financial time-series prediction.
DNN/CNN processing is usually carried out in two stages, training and inference, with both of them having their own computational needs. Training is the process where a DNN model is trained using a large application-specific data set. The training time is dependent on the model size and the target accuracy requirements. For high accuracy applications like autonomous driving, training a DNN can take weeks and is usually performed on a cloud. Inference, on the other hand, can be performed either on the cloud or the edge device (mobile device, Internet of things (IoT), autonomous vehicle, etc.). Nowadays, in many applications, it is advantageous to perform the inference process on the edge devices, as shown in Figure 1.1. For example, in cellphones, it is desirable to perform image and video processing on the device itself rather than sending the data over to the cloud for processing. This methodology reduces the communication cost and the latency involved with the data transmission and reception. It also eliminates the risk of losing important device features should there be a network disruption or loss of connectivity. Another motivation for doing inference on the device is the ever-increasing security risk involved with sending personalized data, including images and videos, over to the cloud servers for processing. Autonomous driving systems which require visual data need to deploy solutions to perform inference locally to avoid latency and security issues, both of which can result in a catastrophe, should an undesirable event occurs. Performing DNN/CNN inference on the edge presents its own set of challenges. This stems from the fact that the embedded platforms running on the edge devices have stringent cost limitations which limit their compute capabilities. Running compute and memory-intensive DNN/CNN inference in these devices in an efficient manner becomes a matter of prime importance.
Figure 1.1 DNN/CNN processing methodology.
Source: (b) Daughter#3 - Cecil/Wikimedia Commons/CC BY-SA 2.0.
1.1 History and Applications
Neural nets have been around since the 1940s; however, the first practically applicable neural network, referred to as the LeNet [5], was proposed in 1989. This neural network was designed to solve the problem of digit recognition in hand-written numeric digits. It paved the way for the development of neural networks responsible for various applications related to digit recognition, such as an automated teller machine (ATM), optical character recognition (OCR), automatic number plate recognition, and traffic signs recognition. The slow growth and a little to no adoption of neural networks in the early days is mainly due to the massive computational requirements involved with their processing which limited their study to theoretical concepts.
Over the past decade, there has been an exponential growth in the research on DNNs with many new high accuracy neural networks being deployed for various applications. This has only been possible because of two factors. The first factor is the advancements in the processing power of semiconductor devices and technological breakthroughs in computer architecture. Nowadays, computers have significantly higher computing capability. This enables the processing of a neural network within a reasonable time frame, something that was not achievable in the early days. The second factor is the availability of a large amount of training datasets. As neural networks learn over time, providing huge amounts of training data enables better accuracy. For example, Meta (parent company of Facebook) receives close to a billion user images per day, whereas YouTube has 300 hours of video uploaded every minute [6]. This enables the service providers to train their neural networks for targeted advertising campaigns bringing in billions of dollars of advertising revenue. Apart from their use in social media platforms, DNNs are impacting many other domains and are making a huge impact. Some of these areas include:
- Speech Processing: Speech processing algorithms have improved significantly in the past few years. Nowadays, many applications have been developed that use DNNs to perform real-time speech recognition with unprecedented levels of accuracy [3,7-9]. Many technology companies are also using DNNs to perform language translation used in a wide variety of applications. Google, for example, uses Google's neural machine translation system (GNMT) [10] which uses LSTM-based seq2seq model for their language translation applications.
- Autonomous Driving: Autonomous driving has been one of the biggest technological breakthroughs in the auto industry since the invention of the internal combustion engine. It is not a coincidence that the self-driving boom came at the same time when high accuracy CNNs became increasingly popular. Companies like Tesla [11] and Waymo [12] are using various types of self-driving technology including visual feeds and Lidar for their self-driving solutions. One thing which is common in all these solutions is the use of CNNs for visual perception of the road conditions which is the main back-end technology used in advanced driver assistance systems (ADAS).
- Medical AI: Another crucial area where DNNs/CNNs have become increasingly useful is medicine. Nowadays, doctors can use AI-assisted medical imagery to perform various surgeries. AI systems use DNNs in genomics to gather insights about genetic disorders like autism [13, 14]. DNNs/CNNs are also useful in the detection of various types of cancers like skin and brain cancer [15, 16].
- Security: The advent of AI has challenged many traditional security approaches that were previously deemed sufficient. The rollout of 5G technology has caused a massive surge of IoT-based deployments which traditional security approaches are not able to keep up with. Physical unclonability approaches [17-21] were introduced to protect this massive deployment of IoTs against security attacks with minimum cost overheads. These approaches, however, were also unsuccessful in preventing AI-assisted attacks using DNNs [22, 23]. Researchers have now been forced to upgrade the security threat models to incorporate AI-based attacks [24, 25]. Because of a massive increase in AI-assisted cyber-attacks on cloud and datacenters, companies have realized that the best way of defeating offensive AI attacks is by incorporating AI-based counterattacks [26, 27].
Overall, the use of DNNs, in particular CNNs, in various applications has seen exponential growth over the past decade, and this trend has been on the rise for the past many years. The massive increase in CNN deployments on the edge devices requires the development of efficient processing architectures to keep up with the computational requirements for successful CNN inference.
1.2 Pitfalls of High-Accuracy DNNs/CNNs
This section discusses some of the pitfalls of high-accuracy DNN/CNN models focusing on compute and energy bottlenecks, and the effect of sparsity of high-accuracy models on throughput and hardware utilization.
1.2.1 Compute and Energy Bottleneck
CNNs are composed of multiple convolution layers (CONV) which help in extracting low-, mid-, and high-level input features for better accuracy. Although CNNs are primarily used in applications related to image and video processing, they are also used in speech processing [3, 7], gameplay [28], and robotics [29] applications. We will further discuss the basics of CNNs in Chapter 2. In this section, we explore some of the bottlenecks...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.