LLVM Code Generation

Name: LLVM Code Generation | A deep dive into compiler backend development
Brand: Packt Publishing Limited
Price: 29.99 EUR
Availability: OnlineOnly

A deep dive into compiler backend development

Quentin Colombet Kristof Beyls(Author)

Packt Publishing Limited

1st Edition

Published on 23. May 2025

608 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

ePUB without DRM

System requirements

978-1-83546-257-7 (ISBN)

from €29.99

Available for download

Watchlist: see prices

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Explore the world of code generation with the LLVM infrastructure, and learn how to extend existing backends or develop your ownKey Features - Understand the steps involved in generating assembly code from LLVM IR
- Learn the key constructs needed to leverage LLVM for your hardware or backend
- Strengthen your understanding with targeted exercises and practical examples in every chapter
- Purchase of the print or Kindle book includes a free PDF eBook
Book DescriptionThe LLVM infrastructure is a popular compiler ecosystem widely used in the tech industry and academia. This technology is crucial for both experienced and aspiring compiler developers looking to make an impact in the field. Written by Quentin Colombet, a veteran LLVM contributor and architect of the GlobalISel framework, this book provides a primer on the main aspects of LLVM, with an emphasis on its backend infrastructure; that is, everything needed to transform the intermediate representation (IR) produced by frontends like Clang into assembly code and object files. You'll learn how to write an optimizing code generator for a toy backend in LLVM. The chapters will guide you step by step through building this backend while exploring key concepts, such as the ABI, cost model, and register allocation. You'll also find out how to express these concepts using LLVM's existing infrastructure and how established backends address these challenges. Furthermore, the book features code snippets that demonstrate the actual APIs. By the end of this book, you'll have gained a deeper understanding of LLVM. The concepts presented are expected to remain stable across different LLVM versions, making this book a reliable quick reference guide for understanding LLVM.What you will learn - Understand essential compiler concepts, such as SSA, dominance, and ABI
- Build and extend LLVM backends for creating custom compiler features
- Optimize code by manipulating LLVM's Intermediate Representation
- Contribute effectively to LLVM open-source projects and development
- Develop debugging skills for LLVM optimizations and passes
- Grasp how encoding and (dis)assembling work in the context of compilers
- Utilize LLVM's TableGen DSL for creating custom compiler models
Who this book is forThis book is for both beginners to LLVM and experienced LLVM developers. If you're new to LLVM, it offers a clear, approachable guide to compiler backends, starting with foundational concepts. For seasoned LLVM developers, it dives into less-documented areas such as TableGen, MachineIR, and MC, enabling you to solve complex problems and expand your expertise. Whether you're starting out or looking to deepen your knowledge, this book has something for you.

All prices

More details

Content

Intro
Preface
Part 1: Getting Started with LLVM
Chapter 1: Building LLVM and Understanding the Directory Structure
Getting the most out of this book - get to know your free benefits
Technical requirements
Getting ready for LLVM's world
Prerequisites
Identifying the right version of the tools
Installing the right tools
Building a compiler
What is a compiler?
Opening Clang's hood
Building Clang
Experimenting with Clang
Building LLVM
Configuring the build system
Crash course on Ninja
Building the core LLVM project
Testing a compiler
Crash course on the Google test infrastructure
Crash course on the LLVM Integrated Tester
Testing in lit
Directives
Describing the RUN command
The lit driver - llvm-lit
Crash course on FileCheck
FileCheck by example
LLVM unit tests
Finding the source of a test
Running unit tests manually
The unit tests pass, what now?
LLVM functional tests
The LLVM test suite
The functional tests fail - what do you do?
Understanding the directory structure
High-level directory structure
Focusing on the core LLVM project
A word on the include files
Private headers
What is the deal with &project&/include/&project&?
What is include/&project&-c?
Overview of some of the LLVM components
Generic LLVM goodness
Working with the LLVM IR
Generic backend infrastructure
Target-specific constructs
Summary
Quiz time
Chapter 2: Contributing to LLVM
Reporting an issue
Engaging with the community
Reviewing patches
Contributing patches
Understanding patch contribution in a nutshell
Following up with your contribution
A word on adding tests
Summary
Quiz time
Chapter 3: Compiler Basics and How They Map to LLVM APIs
Technical requirements
A word on APIs
Understanding compiler jargon
Target
Host
Lowering
Canonical form
Build time, compile time, and runtime
Backend and middle-end
Application binary interface
Encoding
Working with basic structures
Module
A module at the LLVM IR level
A module at the Machine IR level
Function
A function in the LLVM IR
A function in the Machine IR
Basic block
A basic block in the LLVM IR
A basic block in the Machine IR
Instruction
An instruction in the LLVM IR
An instruction in the Machine IR
Control flow graph
Reverse post-order traversal
Backedge
Critical edge
Irreducible graph
Building your first IRs
Building your first LLVM IR
A walk over the required APIs
Your turn
Building your first Machine IR
A walk over the required APIs
Your turn
Summary
Quiz time
Chapter 4: Writing Your First Optimization
Technical requirements
The concept of value
SSA
Constructing the SSA form
Dominance
Def-use and use-def chains
Def-use and use-def chains in the LLVM IR
Def-use and use-def chains in the Machine IR
Tackling optimizations
Legality
Integer overflow/underflow
Fast-math flags
Side effects
Profitability
Instruction lowering - TargetTransformInfo and TargetLowering
Library support - TargetLibraryInfo
Datatype properties - DataLayout
Register pressure
Basic block frequency
More precise instruction properties - scheduling model and instruction description
Transformation jargon
Instcombine
Fixed point
Liveness
Hoisting
Sinking
Folding
Loops
Terminology
Preheader
Header
Exiting block
Latch
Exit block
Where to get loop information
Writing a simple constant propagation optimization
The optimization
Simplifying assumptions
Missing APIs
The Constant class
The APInt class
Creating a constant
Replacing a value
Your turn
Going further
Legality
Profitability
Propagating constants across types
Summary
Quiz time
Chapter 5: Dealing with Pass Managers
Technical requirements
What is a pass?
What is a pass manager?
The legacy and new pass manager
Pass managers' capabilities
Populating a pass manager
Inner workings of pass managers
Creating a pass
Writing a pass for the legacy pass manager
Using the proper base class
Expressing the dependencies of a pass
Preserving analyses
Specificities of the Pass class
Writing a pass for the new pass manager
Implementing the right method
Registering an analysis
Describing the effects of your pass
Inspecting the pass pipeline
Available developer tools
Plumbing up the information you need
Interpreting the logs of pass managers
The pass pipeline structure
Time profile
Your turn
Writing your own pass
Writing your own pass pipeline
Summary
Further reading
Quiz time
Chapter 6: TableGen - LLVM Swiss Army Knife for Modeling
Technical requirements
Getting started with TableGen
The TableGen programming language
Types
Programming with TableGen
Defining multiple records at once
Assigning fields
Discovering a TableGen backend
General information on TableGen backends for LLVM
Discovering a TableGen backend
The implementation of intrinsics
The content of a generated file
The source of a TableGen backend
Debugging the TableGen framework
Identifying the failing component
Cracking open a TableGen backend
Summary
Further reading
Quiz time
Part 2: Middle-End: LLVM IR to LLVM IR
Chapter 7: Understanding LLVM IR
Technical requirements
Understanding the need for an IR
What an IR is
Why use an IR?
Introducing LLVM IR
Identifiers
Functions
Basic blocks
Instructions
Types
Single-value types
The label type
Aggregate types
Types in the LLVM IR API
Walking through an example
Target-specific elements in LLVM IR
Intrinsic functions
Triple
Function attributes
Data layout
Application binary interface
Textual versus binary format
LLVM IR API - cheat sheet
Summary
Further reading
Quiz time
Chapter 8: Survey of the Existing Passes
Technical requirements
How to find the unknown
Leveraging opt
Using the LLVM code base
Starting from the implementation
Survey of the helper passes
The verifier
The printer
Analysis passes
Target transformation information
Loop information
Alias analysis
Block frequency info
Dominator tree information
Value tracking
Canonicalization passes
The instruction combiner
An example of a canonical rewrite
An example of an optimization
How to use instcombine
The memory to register rewriter
The converter to loop-closed-SSA form
Optimization passes
Interprocedural optimizations
Scalar optimizations
Vectorization
Summary
Further reading
Quiz time
Chapter 9: Introducing Target-Specific Constructs
Technical requirements
Adding a new backend in LLVM
Connecting your target to the build system
Registering your target with Clang
Adding a new architecture to the Triple class
Populating the Target instance
Plumbing your Target through Clang
Creating your own intrinsics
The pros and cons of intrinsics
Creating an intrinsic in the backend
Defining our intrinsics
Hooking up the TableGen backend
Teaching LLVM IR about our intrinsics
Connecting an intrinsic to Clang
Writing the .def file by hand
Using the TableGen capabilities
Hooking up the built-in information
Establishing the code generation link
Adding a target-specific TargetTransformInfo implementation
Establishing a connection to your target-specific information
Introducing target-specific costs
Customizing the default middle-end pipeline
Using the new pass manager
Using the legacy pass manager
A one-time setup - assembling a codegen pipeline
Faking the instruction selector
Faking the lowering of the object file
Creating a skeleton for the assembly information
Using the right abstraction
Summary
Further reading
Quiz time
Chapter 10: Hands-On Debugging LLVM IR Passes
Technical requirements
The logging capabilities in LLVM
Printing the IR between passes
Printing the debug log
Printing high-level information about what happened
Reducing the input IR size
Extracting a subset of the input IR
Shrinking the IR automatically
Using sanitizers
A crash course on LLDB
Starting a debugging session
Controlling the execution
Stopping the program
Command resolution
Resuming the execution
Inspecting the state of a program
The LLVM code base through a debugger
Summary
Further reading
Quiz time
Part 3: Introduction to the Backend
Chapter 11: Getting Started with the Backend
Technical requirements
Introducing the Machine IR
Here comes the Machine IR
The Machine IR textual representation
The .mir file format
A primer on the YAML syntax
The semantics of the different fields
Mapping the content of a .mir file to the C++ API
A deep dive into the body of a MachineFunction instance
Working with a .mir file
Generating a .mir file
Running passes
Shrinking a .mir file
The anatomy of a MachineInstr instance
Introducing the MC layer
Working with MachineOperand instances
Unboxing a MachineOperand instance
Dealing with explicit and implicit operands
Understanding the constraints of an operand
Working with registers
The concept of the register class
The concept of sub-registers
The concept of register tuples
The concept of register units
The registers and SSA and non-SSA forms
Interacting with registers in the debugger
Creating MachineInstr objects
Describing registers
Writing the target description
Describing instructions
Summary
Further reading
Quiz time
Chapter 12: Getting Started with the Machine Code Layer
Technical requirements
The use of the MC layer
Connecting the MC layer
What instructions to describe
Augmenting the target description with MC information
Defining the MC layer for the registers
Defining the MC layer for the instructions
Enabling MC-based tools
Leveraging TableGen
Implementing the missing pieces
Implementing your own MCInstPrinter class
Implementing your own MCCodeEmitter class
Implementing your own XXXAsmParser class
Summary
Quiz time
Chapter 13: The Machine Pass Pipeline
Technical requirements
The Machine pass pipeline at a glance
Injecting passes
Using the generic Machine optimizations
Generic passes worth mentioning
The CodeGenPrepare pass
The PeepholeOptimizer pass
The MachineCombiner pass
Summary
Further reading
Quiz time
Part 4: LLVM IR to Machine IR
Chapter 14: Getting Started with Instruction Selection
Technical requirements
Overview of the instruction selection frameworks
How does instruction selection work?
Framework complementarity
Overall differences between the selectors
Compile time
Modularity and testability
Scope
Which selector to use?
FastISel
SDISel
GlobalISel
Selectors' inner workings
Understanding the DAG representation
Textual representation of the SelectionDAG class
Manipulating a DAG
Understanding the generic Machine IR
Textual representation of generic attributes
Lowering constraints of the generic Machine IR
APIs to work with the generic Machine IR
Groundwork to connect the codegen pipeline
Instantiating the codegen pass pipeline
Providing the key target APIs to the codegen pipeline
Connecting SDISel to the codegen pipeline
Connecting FastISel to the codegen pipeline
Connecting GlobalISel to the codegen pipeline
Choosing between different selectors
Summary
Further reading
Quiz time
Chapter 15: Instruction Selection: The IR Building Phase
Technical requirements
Overview of the IR building
Describing the calling convention
Writing your target description of the calling convention
Connecting the gen-callingconv TableGen backend
Anatomy of the CCValAssign class
Lowering the ABI with SDISel
Implementing the lowering of formal arguments
Providing custom description for the SDNode class
Handling of stack locations
Lowering the ABI with FastISel
Lowering the ABI with GlobalISel
Summary
Further reading
Quiz time
Chapter 16: Instruction Selection: The Legalization Phase
Technical requirements
Legalization overview
Legalization actions
Legalization in SDISel
Describing your legal types
Describing your legalization actions
Implementing a custom legalization action
Legalization in GlobalISel
Describing your legalization actions with the LegalizeRuleSet class
Custom legalization in GlobalISel
Summary
Quiz time
Chapter 17: Instruction Selection: The Selection Phase and Beyond
Technical requirements
Register bank selection
The goal of the register bank selection
Describing the register banks
Implementing your RegisterBankInfo class
Instruction selection
Expressing your selection patterns
Introduction to the selection patterns
Advanced selection patterns
Selection in SDISel
Selection in FastISel
Selection in GlobalISel
Setting up the InstructionSelector class
Importing the selection patterns
Going beyond patterns
Finalizing the selection pipeline
Using custom inserters
Customizing the TargetLowering::finalizeLowering method
Optimizations
Using the DAGCombiner framework
Leveraging the combiner framework
Debugging the selectors
Debugging SDISel
Debugging the GlobalISel match table
Summary
Quiz time
Part 5: Final Lowering and Optimizations
Chapter 18: Instruction Scheduling
Technical requirements
Overview of the instruction scheduling framework
The ScheduleDAGInstrs class
Changing the scheduling algorithm
The scheduling model
The scheduling events
The processing units
The scheduling bindings
Gluing everything together
Implementing your scheduling model
Connecting your scheduling model
Describing a processor model
Instantiating your subtarget
Guidelines to get started with your scheduling model
Summary
Quiz time
Chapter 19: Register Allocation
Technical requirements
Overview of register allocation in LLVM
Enabling the register allocation infrastructure
Introducing the slot indexes
Introducing the live intervals
Maintaining the live intervals
Summary
Further reading
Quiz time
Chapter 20: Lowering of the Stack Layout
Technical requirements
Overview of stack lowering
Handling of stack slots
From frame index to stack slot
The lowering of the stack frame
Introducing the reserved call frame
Implementing the frame-lowering target hooks
The expansion of the frame indices
Introducing register scavenging
Provisioning an emergency spill slot
Expanding the frame indices
Summary
Quiz time
Chapter 21: Getting Started with the Assembler
Technical requirements
Overview of the lowering of a textual assembly file
Assembling with the LLVM infrastructure
Implementing an assembler
Providing the MCCodeEmitter class
Handling the fixups with the MCAsmBackend class
Recording the relocations with the MCObjectTargetWriter class
Summary
Further reading
Quiz time
Chapter 22: Unlock Your Book's Exclusive Benefits
How to unlock these benefits in three easy steps
Other Books You May Enjoy
Index

1 Building LLVM and Understanding the Directory Structure

The LLVM infrastructure provides a set of libraries that can be assembled to create different tools and compilers.

LLVM originally stood for Low-Level Virtual Machine. Nowadays, it is much more than that, as you will shortly learn, and people just use LLVM as a name.

Given the sheer volume of code that makes the LLVM repository, it can be daunting to even know where to start.

In this chapter, we will give you the keys to approach and use this code base confidently. Using this knowledge, you will be able to do the following:

Understand the different components that make a compiler
Build and test the LLVM project
Navigate LLVM's directory structure and locate the implementation of different components
Contribute to the LLVM project

This chapter covers the basics needed to get started with LLVM. If you are already familiar with the LLVM infrastructure or followed the tutorial from the official LLVM website (https://llvm.org/docs/GettingStarted.html), you can skip it. You can, however, check the Quiz time section at the end of the chapter to see whether there is anything you may have missed.

Technical requirements

To work with the LLVM code base, you need specific tools on your system. In this section, we list the required versions of these tools for the latest major LLVM release: 20.1.0.

Later, in Identifying the right version of the tools, you will learn how to find the version of the tools required to build a specific version of LLVM, including older and newer releases and the LLVM top-of-tree (that is, the actively developed repository). Additionally, you will learn how to install them.

With no further due, here are the versions of the tools required for LLVM 20.1.0:

Tool

Required version

Git

None specified

C/C++ toolchain

>=Clang 5.0

>=Apple Clang 10.0

>=GCC 7.4

>=Visual Studio 2019 16.8

CMake

>=3.20.0

Ninja

None specified

Python

>=3.8

Table 1.1: Tools required for LLVM 20.1.0

Furthermore, this book comes with scripts, examples, and more that will ease your journey with learning the LLVM infrastructure. We will specifically list the relevant content in the related sections, but remember that the repository lives at https://github.com/PacktPublishing/LLVM-Code-Generation.

Getting ready for LLVM's world

In the Technical requirement section, we already listed which version of tools you needed to work with LLVM 20.1.0. However, LLVM is a lively project and what is required today may be different than what is required tomorrow. Also, to step back a bit, you may not know why you need these tools to begin with and/or how to get them.

This section addresses these questions, and you will learn the following in the process:

The purpose of each required tool
How to check that your environment has the proper tools
How to install the proper tools

Depending on how familiar you are with development on Linux/macOS, this setup can be tedious or a walk in the park.

Ultimately, this section aims to teach you how to go beyond a fixed release of LLVM by giving you the knowledge required to find the information you need.

If you are familiar with package managers (e.g., the apt-get command-line tool on Linux and Homebrew (https://brew.sh) on macOS), you can skip this part and directly install Git, Clang, CMake, Ninja, and Python through them. For Windows, if you do not have a package manager, the steps provided here are all manual, meaning that if you pick the related Windows binary distribution of the related tools, it should just work. Now, for Windows again, you may be better off installing these tools through Visual Studio Code (VS Code) (https://code.visualstudio.com) via the VS Code's extensions.

In any case, you might want to double-check which version of these tools you need by going through the Identifying the right version of the tools section.

Prerequisites

As mentioned previously, you need a set of specific tools to build the LLVM code base. This section summarizes what each of these tools does and how they work together to build the LLVM project.

This list of tools is as follows:

Git: The software used for the versioning control of LLVM
A C/C++ toolchain: The LLVM code base is in C/C++, and as such, we will need a toolchain to build that type of code
CMake: The software used to configure the build system
Ninja: The software used to drive the build system
Python: The scripting language and execution environment used for testing

Figure 1.1 illustrates how the different tools work together to build an LLVM compiler:

Figure 1.1: The essential command-line tools to build an LLVM compiler

Breaking this figure down, here are the steps it takes:

Git retrieves the source code.
CMake generates the build system for a particular driver, such as Ninja, and a particular C/C++ toolchain.
Ninja drives the build process.
The C/C++ toolchain builds the compiler.
Python drives the execution of the tests.

Identifying the right version of the tools

The required version of these tools depends on the version of LLVM you are building. For instance, see the Technical requirements section for the latest major release of LLVM, 20.1.0.

To check the required version for a specific release, check out the Getting Started page of the documentation for this release. To get there, perform the following steps:

Go to https://releases.llvm.org/.
Scroll down to the Download section.
In the documentation column, click on the link named llvm or docs for the release you are interested in. For instance, release 20.1.0 should bring you to a URL such as https://releases.llvm.org/20.1.0/docs/index.html.
Scroll down to the Documentation section.
Click on Getting Started/Tutorials.
Find the Software and the Host C++ Toolchain[...] sections. For instance, for release 20.1.0, the Software section lives at https://releases.llvm.org/20.1.0/docs/GettingStarted.html#software.

To find the requirements for LLVM top-of-tree, simply follow the same steps but with the release named Git. This release should have a release date of Current.

You learned how to identify which version of the tools you need to have to be able to work with LLVM. Now, let's see how to install these versions.

Note

Ninja is the preferred driver of the build system of LLVM. However, LLVM also supports other drivers such as Makefile (the default), Xcode, and, to some extent, Bazel. Feel free to choose what works best for you.

Installing the right tools

Depending on your operating system (OS), you may have already all the necessary tools installed. You can use the following commands to check which version of the tools are installed and whether they meet the minimum requirements that we described in the previous section:

Tool

Checking the availability

Git

git -version

C/C++ toolchain (LLVM)

clang -version

CMake

...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

LLVM Code Generation

Description

All prices

More details

Content

1

Building LLVM and Understanding the Directory Structure

Technical requirements

Getting ready for LLVM's world

Prerequisites

Identifying the right version of the tools

Installing the right tools

System requirements