1
Introduction to Parallel Programming
Welcome to the world of graphics processing unit (GPU) programming!
Before we talk about programming GPUs, we must understand what parallel programming is and how it can benefit our applications. As with everything in life, it has its challenges. In this chapter, we'll explore both the benefits and drawbacks of parallel programming, laying the groundwork for our deep dive into GPU programming. So in this first chapter, we'll be discussing a variety of topics without developing any code. In doing so, we'll establish the foundations on which to build throughout our journey.
Apart from being useful, the information provided in this chapter is fundamental to understanding what happens inside a GPU, as we'll discuss shortly. By the end of the chapter, you'll understand why parallelism is important and when it makes sense to use it in your applications.
In this chapter, we're going to cover the following main topics:
- What parallelism is in software, and why it's important
- Different types of parallelism
- An overview of GPU architecture
- Comparing central processing units (CPUs) and GPUs
- Advantages and challenges of GPU programming
Getting the most out of this book - get to know your free benefits
Unlock exclusive free benefits that come with your purchase, thoughtfully crafted to supercharge your learning journey and help you learn without limits.
Here's a quick overview of what you get with this book:
Next-gen reader
Figure 1.1: Illustration of the next-gen Packt Reader's features
Our web-based reader, designed to help you learn effectively, comes with the following features:
Multi-device progress sync: Learn from any device with seamless progress sync.
Highlighting and notetaking: Turn your reading into lasting knowledge.
Bookmarking: Revisit your most important learnings anytime.
Dark mode: Focus with minimal eye strain by switching to dark or sepia mode.
Interactive AI assistant (beta)
Figure 1.2: Illustration of Packt's AI assistant
Our interactive AI assistant has been trained on the content of this book, to maximize your learning experience. It comes with the following features:
Summarize it: Summarize key sections or an entire chapter.
AI code explainers: In the next-gen Packt Reader, click the Explain button above each code block for AI-powered code explanations.
Note: The AI assistant is part of next-gen Packt Reader and is still in beta.
DRM-free PDF or ePub version
Figure 1.3: Free PDF and ePub
Learn without limits with the following perks included with your purchase:
Learn from anywhere with a DRM-free PDF copy of this book.
Use your favorite e-reader to learn using a DRM-free ePub version of this book.
Unlock this book's exclusive benefits now
Scan this QR code or go to packtpub.com/unlock, then search for this book by name. Ensure it's the correct edition.
Note: Keep your purchase invoice ready before you start.
Technical requirements
For this chapter, the only technical requirement that we have is the goodwill to keep reading!
What is parallelism in software?
Parallel programming is a way of making a computer do many things at once. But wait - isn't this what already happens daily? Yes and no. Most common processors today are capable of executing more than one task at the same time - and we mean at the same time. However, this is only the first requirement for parallel software. The second is to make at least some of the processor cores work on the same problem in a coordinated way. Let's consider an example.
Imagine that you're taking on a big task, such as sorting a huge pile of books. Instead of doing it alone, you ask a group of friends to help. Each friend takes a small part of the pile and sorts it. You all work at the same time, and the job gets done much faster. This is similar to how parallel programming works: it breaks a big problem into smaller pieces and solves them at the same time using multiple cores.
Of course, this example was chosen because it has a special characteristic: it's easily parallelizable in the sense that we can perceive how to break the big tasks into smaller ones. Not all problems can be easily broken down for parallel processing. One of our first challenges is finding ways to decompose problems into smaller tasks that can be executed simultaneously. Sometimes, there are parts of our algorithm that need to be executed on a single core while all others sit idle before we can separate the parallel tasks. This is usually called a sequential part. It's time for a different example.
Let's suppose you're having a movie and games night with your friends. You all decide to prepare some food and for that, you go to the supermarket. To make things faster, your friends come along so that, once there, everyone can select multiple ingredients at the same time - this is the parallel part. However, since you're all going in a single car, only one person can drive at a given time, no matter how many licensed drivers there are in the vehicle. You can always argue that they could take turns driving a part of the way, but in this scenario, it would only take longer to get to the supermarket.
Upon arriving, each person heads to a different aisle to gather the pre-defined ingredients. Once everything is collected, another crucial decision arises: should each person go to a separate checkout line to pay with their credit card, or should they all queue together if they only have one card? Opting for the parallel payment method reveals another interesting aspect of parallel processing.
Even when tasks are processed in parallel (each person is on a different checkout line), the execution times can vary unpredictably. This means that at any given moment, different lines move at different speeds, and those who have already paid for their ingredients may end up waiting for their friends (processors) to finish their payments.
Once all the payments are complete, a new sequential part is followed: driving back home. This time, a different driver might be executing this task while the other people - I mean, processors - sit idle waiting for the next task to execute. Some algorithms have sequential parts to synchronize data or to share intermediate results, and that's why only one processor is working. Here, we're collecting the data that each processor - I mean, friend - got from the supermarket and we have to move this from one location to another. There's no use for parallelism in this small part.
Why is parallelism important?
There are many situations in which the size of the problems we want to solve increases dramatically. And this is the moment when we have to start talking about more 'serious' real-world applications, such as weather forecasting, scientific research, and artificial intelligence.
Remember when we were driving to the supermarket and we mentioned that we could switch drivers for each part of the way? Wouldn't this only end up taking us more time? This was due to context switching - we would have to find a place to park, then switch drivers, then drive the car until the next stop. But why are we talking about this again? Because most of the time, we need a 'serious' real-world application to make it worthwhile working through all the details of parallel programming.
One exception could be using parallel programming to accelerate graphics and physics processing in video games; although these applications may not be critical for human life, they're pretty serious. We could always classify video games within the 'serious' simulation category. Let's understand some of the benefits we get by using parallelism in our software.
Speeding up tasks
Splitting tasks into smaller parts that can be done simultaneously dramatically speeds up the overall process. We now have multiple processors...