
LLVM Code Generation
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Learn the key constructs needed to leverage LLVM for your hardware or backend
- Strengthen your understanding with targeted exercises and practical examples in every chapter
- Purchase of the print or Kindle book includes a free PDF eBook
Book DescriptionThe LLVM infrastructure is a popular compiler ecosystem widely used in the tech industry and academia. This technology is crucial for both experienced and aspiring compiler developers looking to make an impact in the field. Written by Quentin Colombet, a veteran LLVM contributor and architect of the GlobalISel framework, this book provides a primer on the main aspects of LLVM, with an emphasis on its backend infrastructure; that is, everything needed to transform the intermediate representation (IR) produced by frontends like Clang into assembly code and object files. You'll learn how to write an optimizing code generator for a toy backend in LLVM. The chapters will guide you step by step through building this backend while exploring key concepts, such as the ABI, cost model, and register allocation. You'll also find out how to express these concepts using LLVM's existing infrastructure and how established backends address these challenges. Furthermore, the book features code snippets that demonstrate the actual APIs. By the end of this book, you'll have gained a deeper understanding of LLVM. The concepts presented are expected to remain stable across different LLVM versions, making this book a reliable quick reference guide for understanding LLVM.What you will learn - Understand essential compiler concepts, such as SSA, dominance, and ABI
- Build and extend LLVM backends for creating custom compiler features
- Optimize code by manipulating LLVM's Intermediate Representation
- Contribute effectively to LLVM open-source projects and development
- Develop debugging skills for LLVM optimizations and passes
- Grasp how encoding and (dis)assembling work in the context of compilers
- Utilize LLVM's TableGen DSL for creating custom compiler models
Who this book is forThis book is for both beginners to LLVM and experienced LLVM developers. If you're new to LLVM, it offers a clear, approachable guide to compiler backends, starting with foundational concepts. For seasoned LLVM developers, it dives into less-documented areas such as TableGen, MachineIR, and MC, enabling you to solve complex problems and expand your expertise. Whether you're starting out or looking to deepen your knowledge, this book has something for you.
All prices
More details
Content
- Intro
- Preface
- Part 1: Getting Started with LLVM
- Chapter 1: Building LLVM and Understanding the Directory Structure
- Getting the most out of this book - get to know your free benefits
- Technical requirements
- Getting ready for LLVM's world
- Prerequisites
- Identifying the right version of the tools
- Installing the right tools
- Building a compiler
- What is a compiler?
- Opening Clang's hood
- Building Clang
- Experimenting with Clang
- Building LLVM
- Configuring the build system
- Crash course on Ninja
- Building the core LLVM project
- Testing a compiler
- Crash course on the Google test infrastructure
- Crash course on the LLVM Integrated Tester
- Testing in lit
- Directives
- Describing the RUN command
- The lit driver - llvm-lit
- Crash course on FileCheck
- FileCheck by example
- LLVM unit tests
- Finding the source of a test
- Running unit tests manually
- The unit tests pass, what now?
- LLVM functional tests
- The LLVM test suite
- The functional tests fail - what do you do?
- Understanding the directory structure
- High-level directory structure
- Focusing on the core LLVM project
- A word on the include files
- Private headers
- What is the deal with &project&/include/&project&?
- What is include/&project&-c?
- Overview of some of the LLVM components
- Generic LLVM goodness
- Working with the LLVM IR
- Generic backend infrastructure
- Target-specific constructs
- Summary
- Quiz time
- Chapter 2: Contributing to LLVM
- Reporting an issue
- Engaging with the community
- Reviewing patches
- Contributing patches
- Understanding patch contribution in a nutshell
- Following up with your contribution
- A word on adding tests
- Summary
- Quiz time
- Chapter 3: Compiler Basics and How They Map to LLVM APIs
- Technical requirements
- A word on APIs
- Understanding compiler jargon
- Target
- Host
- Lowering
- Canonical form
- Build time, compile time, and runtime
- Backend and middle-end
- Application binary interface
- Encoding
- Working with basic structures
- Module
- A module at the LLVM IR level
- A module at the Machine IR level
- Function
- A function in the LLVM IR
- A function in the Machine IR
- Basic block
- A basic block in the LLVM IR
- A basic block in the Machine IR
- Instruction
- An instruction in the LLVM IR
- An instruction in the Machine IR
- Control flow graph
- Reverse post-order traversal
- Backedge
- Critical edge
- Irreducible graph
- Building your first IRs
- Building your first LLVM IR
- A walk over the required APIs
- Your turn
- Building your first Machine IR
- A walk over the required APIs
- Your turn
- Summary
- Quiz time
- Chapter 4: Writing Your First Optimization
- Technical requirements
- The concept of value
- SSA
- Constructing the SSA form
- Dominance
- Def-use and use-def chains
- Def-use and use-def chains in the LLVM IR
- Def-use and use-def chains in the Machine IR
- Tackling optimizations
- Legality
- Integer overflow/underflow
- Fast-math flags
- Side effects
- Profitability
- Instruction lowering - TargetTransformInfo and TargetLowering
- Library support - TargetLibraryInfo
- Datatype properties - DataLayout
- Register pressure
- Basic block frequency
- More precise instruction properties - scheduling model and instruction description
- Transformation jargon
- Instcombine
- Fixed point
- Liveness
- Hoisting
- Sinking
- Folding
- Loops
- Terminology
- Preheader
- Header
- Exiting block
- Latch
- Exit block
- Where to get loop information
- Writing a simple constant propagation optimization
- The optimization
- Simplifying assumptions
- Missing APIs
- The Constant class
- The APInt class
- Creating a constant
- Replacing a value
- Your turn
- Going further
- Legality
- Profitability
- Propagating constants across types
- Summary
- Quiz time
- Chapter 5: Dealing with Pass Managers
- Technical requirements
- What is a pass?
- What is a pass manager?
- The legacy and new pass manager
- Pass managers' capabilities
- Populating a pass manager
- Inner workings of pass managers
- Creating a pass
- Writing a pass for the legacy pass manager
- Using the proper base class
- Expressing the dependencies of a pass
- Preserving analyses
- Specificities of the Pass class
- Writing a pass for the new pass manager
- Implementing the right method
- Registering an analysis
- Describing the effects of your pass
- Inspecting the pass pipeline
- Available developer tools
- Plumbing up the information you need
- Interpreting the logs of pass managers
- The pass pipeline structure
- Time profile
- Your turn
- Writing your own pass
- Writing your own pass pipeline
- Summary
- Further reading
- Quiz time
- Chapter 6: TableGen - LLVM Swiss Army Knife for Modeling
- Technical requirements
- Getting started with TableGen
- The TableGen programming language
- Types
- Programming with TableGen
- Defining multiple records at once
- Assigning fields
- Discovering a TableGen backend
- General information on TableGen backends for LLVM
- Discovering a TableGen backend
- The implementation of intrinsics
- The content of a generated file
- The source of a TableGen backend
- Debugging the TableGen framework
- Identifying the failing component
- Cracking open a TableGen backend
- Summary
- Further reading
- Quiz time
- Part 2: Middle-End: LLVM IR to LLVM IR
- Chapter 7: Understanding LLVM IR
- Technical requirements
- Understanding the need for an IR
- What an IR is
- Why use an IR?
- Introducing LLVM IR
- Identifiers
- Functions
- Basic blocks
- Instructions
- Types
- Single-value types
- The label type
- Aggregate types
- Types in the LLVM IR API
- Walking through an example
- Target-specific elements in LLVM IR
- Intrinsic functions
- Triple
- Function attributes
- Data layout
- Application binary interface
- Textual versus binary format
- LLVM IR API - cheat sheet
- Summary
- Further reading
- Quiz time
- Chapter 8: Survey of the Existing Passes
- Technical requirements
- How to find the unknown
- Leveraging opt
- Using the LLVM code base
- Starting from the implementation
- Survey of the helper passes
- The verifier
- The printer
- Analysis passes
- Target transformation information
- Loop information
- Alias analysis
- Block frequency info
- Dominator tree information
- Value tracking
- Canonicalization passes
- The instruction combiner
- An example of a canonical rewrite
- An example of an optimization
- How to use instcombine
- The memory to register rewriter
- The converter to loop-closed-SSA form
- Optimization passes
- Interprocedural optimizations
- Scalar optimizations
- Vectorization
- Summary
- Further reading
- Quiz time
- Chapter 9: Introducing Target-Specific Constructs
- Technical requirements
- Adding a new backend in LLVM
- Connecting your target to the build system
- Registering your target with Clang
- Adding a new architecture to the Triple class
- Populating the Target instance
- Plumbing your Target through Clang
- Creating your own intrinsics
- The pros and cons of intrinsics
- Creating an intrinsic in the backend
- Defining our intrinsics
- Hooking up the TableGen backend
- Teaching LLVM IR about our intrinsics
- Connecting an intrinsic to Clang
- Writing the .def file by hand
- Using the TableGen capabilities
- Hooking up the built-in information
- Establishing the code generation link
- Adding a target-specific TargetTransformInfo implementation
- Establishing a connection to your target-specific information
- Introducing target-specific costs
- Customizing the default middle-end pipeline
- Using the new pass manager
- Using the legacy pass manager
- A one-time setup - assembling a codegen pipeline
- Faking the instruction selector
- Faking the lowering of the object file
- Creating a skeleton for the assembly information
- Using the right abstraction
- Summary
- Further reading
- Quiz time
- Chapter 10: Hands-On Debugging LLVM IR Passes
- Technical requirements
- The logging capabilities in LLVM
- Printing the IR between passes
- Printing the debug log
- Printing high-level information about what happened
- Reducing the input IR size
- Extracting a subset of the input IR
- Shrinking the IR automatically
- Using sanitizers
- A crash course on LLDB
- Starting a debugging session
- Controlling the execution
- Stopping the program
- Command resolution
- Resuming the execution
- Inspecting the state of a program
- The LLVM code base through a debugger
- Summary
- Further reading
- Quiz time
- Part 3: Introduction to the Backend
- Chapter 11: Getting Started with the Backend
- Technical requirements
- Introducing the Machine IR
- Here comes the Machine IR
- The Machine IR textual representation
- The .mir file format
- A primer on the YAML syntax
- The semantics of the different fields
- Mapping the content of a .mir file to the C++ API
- A deep dive into the body of a MachineFunction instance
- Working with a .mir file
- Generating a .mir file
- Running passes
- Shrinking a .mir file
- The anatomy of a MachineInstr instance
- Introducing the MC layer
- Working with MachineOperand instances
- Unboxing a MachineOperand instance
- Dealing with explicit and implicit operands
- Understanding the constraints of an operand
- Working with registers
- The concept of the register class
- The concept of sub-registers
- The concept of register tuples
- The concept of register units
- The registers and SSA and non-SSA forms
- Interacting with registers in the debugger
- Creating MachineInstr objects
- Describing registers
- Writing the target description
- Describing instructions
- Summary
- Further reading
- Quiz time
- Chapter 12: Getting Started with the Machine Code Layer
- Technical requirements
- The use of the MC layer
- Connecting the MC layer
- What instructions to describe
- Augmenting the target description with MC information
- Defining the MC layer for the registers
- Defining the MC layer for the instructions
- Enabling MC-based tools
- Leveraging TableGen
- Implementing the missing pieces
- Implementing your own MCInstPrinter class
- Implementing your own MCCodeEmitter class
- Implementing your own XXXAsmParser class
- Summary
- Quiz time
- Chapter 13: The Machine Pass Pipeline
- Technical requirements
- The Machine pass pipeline at a glance
- Injecting passes
- Using the generic Machine optimizations
- Generic passes worth mentioning
- The CodeGenPrepare pass
- The PeepholeOptimizer pass
- The MachineCombiner pass
- Summary
- Further reading
- Quiz time
- Part 4: LLVM IR to Machine IR
- Chapter 14: Getting Started with Instruction Selection
- Technical requirements
- Overview of the instruction selection frameworks
- How does instruction selection work?
- Framework complementarity
- Overall differences between the selectors
- Compile time
- Modularity and testability
- Scope
- Which selector to use?
- FastISel
- SDISel
- GlobalISel
- Selectors' inner workings
- Understanding the DAG representation
- Textual representation of the SelectionDAG class
- Manipulating a DAG
- Understanding the generic Machine IR
- Textual representation of generic attributes
- Lowering constraints of the generic Machine IR
- APIs to work with the generic Machine IR
- Groundwork to connect the codegen pipeline
- Instantiating the codegen pass pipeline
- Providing the key target APIs to the codegen pipeline
- Connecting SDISel to the codegen pipeline
- Connecting FastISel to the codegen pipeline
- Connecting GlobalISel to the codegen pipeline
- Choosing between different selectors
- Summary
- Further reading
- Quiz time
- Chapter 15: Instruction Selection: The IR Building Phase
- Technical requirements
- Overview of the IR building
- Describing the calling convention
- Writing your target description of the calling convention
- Connecting the gen-callingconv TableGen backend
- Anatomy of the CCValAssign class
- Lowering the ABI with SDISel
- Implementing the lowering of formal arguments
- Providing custom description for the SDNode class
- Handling of stack locations
- Lowering the ABI with FastISel
- Lowering the ABI with GlobalISel
- Summary
- Further reading
- Quiz time
- Chapter 16: Instruction Selection: The Legalization Phase
- Technical requirements
- Legalization overview
- Legalization actions
- Legalization in SDISel
- Describing your legal types
- Describing your legalization actions
- Implementing a custom legalization action
- Legalization in GlobalISel
- Describing your legalization actions with the LegalizeRuleSet class
- Custom legalization in GlobalISel
- Summary
- Quiz time
- Chapter 17: Instruction Selection: The Selection Phase and Beyond
- Technical requirements
- Register bank selection
- The goal of the register bank selection
- Describing the register banks
- Implementing your RegisterBankInfo class
- Instruction selection
- Expressing your selection patterns
- Introduction to the selection patterns
- Advanced selection patterns
- Selection in SDISel
- Selection in FastISel
- Selection in GlobalISel
- Setting up the InstructionSelector class
- Importing the selection patterns
- Going beyond patterns
- Finalizing the selection pipeline
- Using custom inserters
- Customizing the TargetLowering::finalizeLowering method
- Optimizations
- Using the DAGCombiner framework
- Leveraging the combiner framework
- Debugging the selectors
- Debugging SDISel
- Debugging the GlobalISel match table
- Summary
- Quiz time
- Part 5: Final Lowering and Optimizations
- Chapter 18: Instruction Scheduling
- Technical requirements
- Overview of the instruction scheduling framework
- The ScheduleDAGInstrs class
- Changing the scheduling algorithm
- The scheduling model
- The scheduling events
- The processing units
- The scheduling bindings
- Gluing everything together
- Implementing your scheduling model
- Connecting your scheduling model
- Describing a processor model
- Instantiating your subtarget
- Guidelines to get started with your scheduling model
- Summary
- Quiz time
- Chapter 19: Register Allocation
- Technical requirements
- Overview of register allocation in LLVM
- Enabling the register allocation infrastructure
- Introducing the slot indexes
- Introducing the live intervals
- Maintaining the live intervals
- Summary
- Further reading
- Quiz time
- Chapter 20: Lowering of the Stack Layout
- Technical requirements
- Overview of stack lowering
- Handling of stack slots
- From frame index to stack slot
- The lowering of the stack frame
- Introducing the reserved call frame
- Implementing the frame-lowering target hooks
- The expansion of the frame indices
- Introducing register scavenging
- Provisioning an emergency spill slot
- Expanding the frame indices
- Summary
- Quiz time
- Chapter 21: Getting Started with the Assembler
- Technical requirements
- Overview of the lowering of a textual assembly file
- Assembling with the LLVM infrastructure
- Implementing an assembler
- Providing the MCCodeEmitter class
- Handling the fixups with the MCAsmBackend class
- Recording the relocations with the MCObjectTargetWriter class
- Summary
- Further reading
- Quiz time
- Chapter 22: Unlock Your Book's Exclusive Benefits
- How to unlock these benefits in three easy steps
- Other Books You May Enjoy
- Index
1
Building LLVM and Understanding the Directory Structure
The LLVM infrastructure provides a set of libraries that can be assembled to create different tools and compilers.
LLVM originally stood for Low-Level Virtual Machine. Nowadays, it is much more than that, as you will shortly learn, and people just use LLVM as a name.
Given the sheer volume of code that makes the LLVM repository, it can be daunting to even know where to start.
In this chapter, we will give you the keys to approach and use this code base confidently. Using this knowledge, you will be able to do the following:
- Understand the different components that make a compiler
- Build and test the LLVM project
- Navigate LLVM's directory structure and locate the implementation of different components
- Contribute to the LLVM project
This chapter covers the basics needed to get started with LLVM. If you are already familiar with the LLVM infrastructure or followed the tutorial from the official LLVM website (https://llvm.org/docs/GettingStarted.html), you can skip it. You can, however, check the Quiz time section at the end of the chapter to see whether there is anything you may have missed.
Technical requirements
To work with the LLVM code base, you need specific tools on your system. In this section, we list the required versions of these tools for the latest major LLVM release: 20.1.0.
Later, in Identifying the right version of the tools, you will learn how to find the version of the tools required to build a specific version of LLVM, including older and newer releases and the LLVM top-of-tree (that is, the actively developed repository). Additionally, you will learn how to install them.
With no further due, here are the versions of the tools required for LLVM 20.1.0:
Tool
Required version
Git
None specified
C/C++ toolchain
>=Clang 5.0
>=Apple Clang 10.0
>=GCC 7.4
>=Visual Studio 2019 16.8
CMake
>=3.20.0
Ninja
None specified
Python
>=3.8
Table 1.1: Tools required for LLVM 20.1.0
Furthermore, this book comes with scripts, examples, and more that will ease your journey with learning the LLVM infrastructure. We will specifically list the relevant content in the related sections, but remember that the repository lives at https://github.com/PacktPublishing/LLVM-Code-Generation.
Getting ready for LLVM's world
In the Technical requirement section, we already listed which version of tools you needed to work with LLVM 20.1.0. However, LLVM is a lively project and what is required today may be different than what is required tomorrow. Also, to step back a bit, you may not know why you need these tools to begin with and/or how to get them.
This section addresses these questions, and you will learn the following in the process:
- The purpose of each required tool
- How to check that your environment has the proper tools
- How to install the proper tools
Depending on how familiar you are with development on Linux/macOS, this setup can be tedious or a walk in the park.
Ultimately, this section aims to teach you how to go beyond a fixed release of LLVM by giving you the knowledge required to find the information you need.
If you are familiar with package managers (e.g., the apt-get command-line tool on Linux and Homebrew (https://brew.sh) on macOS), you can skip this part and directly install Git, Clang, CMake, Ninja, and Python through them. For Windows, if you do not have a package manager, the steps provided here are all manual, meaning that if you pick the related Windows binary distribution of the related tools, it should just work. Now, for Windows again, you may be better off installing these tools through Visual Studio Code (VS Code) (https://code.visualstudio.com) via the VS Code's extensions.
In any case, you might want to double-check which version of these tools you need by going through the Identifying the right version of the tools section.
Prerequisites
As mentioned previously, you need a set of specific tools to build the LLVM code base. This section summarizes what each of these tools does and how they work together to build the LLVM project.
This list of tools is as follows:
- Git: The software used for the versioning control of LLVM
- A C/C++ toolchain: The LLVM code base is in C/C++, and as such, we will need a toolchain to build that type of code
- CMake: The software used to configure the build system
- Ninja: The software used to drive the build system
- Python: The scripting language and execution environment used for testing
Figure 1.1 illustrates how the different tools work together to build an LLVM compiler:
Figure 1.1: The essential command-line tools to build an LLVM compiler
Breaking this figure down, here are the steps it takes:
- Git retrieves the source code.
- CMake generates the build system for a particular driver, such as Ninja, and a particular C/C++ toolchain.
- Ninja drives the build process.
- The C/C++ toolchain builds the compiler.
- Python drives the execution of the tests.
Identifying the right version of the tools
The required version of these tools depends on the version of LLVM you are building. For instance, see the Technical requirements section for the latest major release of LLVM, 20.1.0.
To check the required version for a specific release, check out the Getting Started page of the documentation for this release. To get there, perform the following steps:
- Go to https://releases.llvm.org/.
- Scroll down to the Download section.
- In the
documentationcolumn, click on the link namedllvmordocsfor the release you are interested in. For instance, release 20.1.0 should bring you to a URL such as https://releases.llvm.org/20.1.0/docs/index.html. - Scroll down to the Documentation section.
- Click on Getting Started/Tutorials.
- Find the Software and the Host C++ Toolchain[...] sections. For instance, for release 20.1.0, the Software section lives at https://releases.llvm.org/20.1.0/docs/GettingStarted.html#software.
To find the requirements for LLVM top-of-tree, simply follow the same steps but with the release named Git. This release should have a release date of Current.
You learned how to identify which version of the tools you need to have to be able to work with LLVM. Now, let's see how to install these versions.
Note
Ninja is the preferred driver of the build system of LLVM. However, LLVM also supports other drivers such as Makefile (the default), Xcode, and, to some extent, Bazel. Feel free to choose what works best for you.
Installing the right tools
Depending on your operating system (OS), you may have already all the necessary tools installed. You can use the following commands to check which version of the tools are installed and whether they meet the minimum requirements that we described in the previous section:
Tool
Checking the availability
Git
git -version
C/C++ toolchain (LLVM)
clang -version
CMake
...System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.