LLVM Cookbook

Name: LLVM Cookbook | Over 80 engaging recipes that will help you build a compiler frontend, optimizer, and code generator using LLVM
Brand: Packt Publishing
Availability: OnlineOnly

Over 80 engaging recipes that will help you build a compiler frontend, optimizer, and code generator using LLVM

Mayur Pandey Suyog Sarda(Author)

Packt Publishing

Published on 7. October 2025

296 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

PDF with Adobe-DRM

System requirements

978-1-78528-640-7 (ISBN)

from €35.49

Available for download

Watchlist: see prices

Description

All prices

More details

Other editions

Persons

Content

Table of Contents

Getting started with llvm
Steps in Writing a Frontend
Adding JIT Support and Writing Front-Ends for languages
Preparing Optimization
Implementing optimization
Target Independent Code Generator
Optimizing generated code and Register Allocation
Writing an LLVM Backend
Using LLVM for various useful projects

Vectorizing IR

Vectorization is an important optimization for compilers where we can vectorize code to execute an instruction on multiple datasets in one go. If the backend architecture supports vector registers, a broad range of data can be loaded into those vector registers, and special vector instructions can be executed on the registers.

There are two types of vectorization in LLVM-Superword-Level Parallelism (SLP) and loop vectorization. Loop vectorization deals with vectorization opportunities in a loop, while SLP vectorization deals with vectorizing straight-line code in a basic block. In this recipe, we will see how straight-line code is vectorized.

Getting ready

SLP vectorization constructs a bottom-up tree of the IR expression, and broadly compares the nodes of the tree to see whether they are similar and hence can be combined to form vectors. The file to be modified is lib/Transform/Vectorize/SLPVectorizer.cpp.

We will try to vectorize a piece of straight-line code, such as return a[0] + a[1] + a[2] + a[3].

The expression tree for the preceding type of code will be a somewhat one-sided tree. We will run a DFS to store the operands and the operators.

The IR for the preceding kind of expression will look like this:

define i32 @hadd(i32* %a) { entry: %0 = load i32* %a, align 4 %arrayidx1 = getelementptr inbounds i32* %a, i32 1 %1 = load i32* %arrayidx1, align 4 %add = add nsw i32 %0, %1 %arrayidx2 = getelementptr inbounds i32* %a, i32 2 %2 = load i32* %arrayidx2, align 4 %add3 = add nsw i32 %add, %2 %arrayidx4 = getelementptr inbounds i32* %a, i32 3 %3 = load i32* %arrayidx4, align 4 %add5 = add nsw i32 %add3, %3 ret i32 %add5 }

The vectorization model follows three steps:

Checking whether it's legal to vectorize.
Calculating the profitability of the vectorized code over the scalarized code.
Vectorizing the code if these two conditions are satisfied.

How to do it...

Open the SLPVectorizer.cpp file. A new function needs to be implemented for DFS traversal of the expression tree for the IR shown in the Getting ready section: bool matchFlatReduction(PHINode *Phi, BinaryOperator *B, const DataLayout *DL) { if (!B) return false; if (B->getType()->isVectorTy() || !B->getType()->isIntegerTy()) return false; ReductionOpcode = B->getOpcode(); ReducedValueOpcode = 0; ReduxWidth = MinVecRegSize / DL->getTypeAllocSizeInBits(B->getType()); ReductionRoot = B; ReductionPHI = Phi; if (ReduxWidth < 4) return false; if (ReductionOpcode != Instruction::Add) return false; SmallVector<BinaryOperator *, 32> Stack; ReductionOps.push_back(B); ReductionOpcode = B->getOpcode(); Stack.push_back(B); // Traversal of the tree. while (!Stack.empty()) { BinaryOperator *Bin = Stack.back(); if (Bin->getParent() != B->getParent()) return false; Value *Op0 = Bin->getOperand(0); Value *Op1 = Bin->getOperand(1); if (!Op0->hasOneUse() || !Op1->hasOneUse()) return false; BinaryOperator *Op0Bin = dyn_cast<BinaryOperator>(Op0); BinaryOperator *Op1Bin = dyn_cast<BinaryOperator>(Op1); Stack.pop_back(); // Do not handle case where both the operands are binary //operators if (Op0Bin && Op1Bin) return false; // Both the operands are not binary operator. if (!Op0Bin && !Op1Bin) { ReducedVals.push_back(Op1); ReducedVals.push_back(Op0); ReductionOps.push_back(Bin); continue; } // One of the Operand is binary operand, push that into stack // for further processing. Push the other non-binary operand //into ReducedVals. if (Op0Bin) { if (Op0Bin->getOpcode() != ReductionOpcode) return false; Stack.push_back(Op0Bin); ReducedVals.push_back(Op1); ReductionOps.push_back(Op0Bin); } if (Op1Bin) { if (Op1Bin->getOpcode() != ReductionOpcode) return false; Stack.push_back(Op1Bin); ReducedVals.push_back(Op0); ReductionOps.push_back(Op1Bin); } } SmallVector<Value *, 16> Temp; // Reverse the loads from a[3], a[2], a[1], a[0] // to a[0], a[1], a[2], a[3] for checking incremental // consecutiveness further ahead. while (!ReducedVals.empty()) Temp.push_back(ReducedVals.pop_back_val()); ReducedVals.clear(); for (unsigned i = 0, e = Temp.size(); i < e; ++i) ReducedVals.push_back(Temp[i]); return true; }
Calculate the cost of the resultant vectorized IR and conclude whether it is profitable to vectorize. In the SLPVectorizer.cpp file, add the following lines to the getReductionCost() function: int HAddCost = INT_MAX; // If horizontal addition pattern is identified, calculate cost. // Such horizontal additions can be modeled into combination of // shuffle sub-vectors and vector adds and one single extract element // from last resultant vector. // e.g. a[0]+a[1]+a[2]+a[3] can be modeled as // %1 = load <4 x> %0 // %2 = shuffle %1 <2, 3, undef, undef> // %3 = add <4 x> %1, %2 // %4 = shuffle %3 <1, undef, undef, undef> // %5 = add <4 x> %3, %4 // %6 = extractelement %5 <0> if (IsHAdd) { unsigned VecElem = VecTy->getVectorNumElements(); unsigned NumRedxLevel = Log2_32(VecElem); HAddCost = NumRedxLevel * (TTI->getArithmeticInstrCost(ReductionOpcode, VecTy) + TTI->getShuffleCost(TargetTransformInfo::SK_ExtractSubvector, VecTy, VecElem / 2, VecTy)) + TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, 0); }
In the same function, after calculating PairwiseRdxCost and SplittingRdxCost, compare them with HAddCost: VecReduxCost = HAddCost < VecReduxCost ? HAddCost : VecReduxCost;
In the vectorizeChainsInBlock() function, call the matchFlatReduction() function you just defined: // Try to vectorize horizontal reductions feeding into a return. if (ReturnInst *RI = dyn_cast<ReturnInst>(it)) if (RI->getNumOperands() != 0) if (BinaryOperator *BinOp = dyn_cast<BinaryOperator>(RI->getOperand(0))) { DEBUG(dbgs() << "SLP: Found a return to vectorize.\n"); HorizontalReduction HorRdx; IsReturn = true; if ((HorRdx.matchFlatReduction(nullptr, BinOp, DL) && HorRdx.tryToReduce(R, TTI)) || tryToVectorizePair(BinOp->getOperand(0), BinOp->getOperand(1), R)) { Changed = true; it = BB->begin(); e = BB->end(); continue; } }
Define two global flags to keep a track of horizontal reduction, which feeds into a return: static bool IsReturn = false; static bool IsHAdd = false;
Allow the vectorization of small trees if they feed into a return. Add the following line to the isFullyVectorizableTinyTree() function: if (VectorizableTree.size() == 1 && IsReturn && IsHAdd)return true;

How it works.

Compile the LLVM project after saving the file containing the preceding code, and run the opt tool on the example IR, as follows:

Open the example.ll file and paste the following IR in it: define i32 @hadd(i32* %a) { entry: %0 = load i32* %a, align 4 %arrayidx1 = getelementptr inbounds i32* %a, i32 1 %1 = load i32* %arrayidx1, align 4 %add = add nsw i32 %0, %1 %arrayidx2 = getelementptr inbounds i32* %a, i32 2 %2 = load i32* %arrayidx2, align 4 %add3 = add nsw i32 %add, %2 %arrayidx4 = getelementptr inbounds i32* %a, i32 3 %3 = load i32* %arrayidx4, align 4 %add5 = add nsw i32 %add3, %3 ret i32 %add5 }
Run the opt tool on example.ll: $ opt -basicaa -slp-vectorizer -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a57
The output will be vectorized code, like the following:
define i32 @hadd(i32* %a) { entry: %0 = bitcast i32* %a to <4 x i32>* %1 = load <4 x i32>* %0, align 4 %rdx.shuf = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> %bin.rdx = add <4 x i32> %1, %rdx.shuf %rdx.shuf1 = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx2 = add <4 x i32> %bin.rdx, %rdx.shuf1 %2 = extractelement <4 x i32> %bin.rdx2, i32 0 ret i32 %2 }

As observed, the code gets vectorized. The matchFlatReduction() function performs a DFS traversal of the expression and stores all the...

System requirements

File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)

System requirements:

Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).

The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.

Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.

For more information, see our ebook Help page.

File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)

System requirements:

Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).

The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.

Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.

For more information, see our eBook Help page.

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

LLVM Cookbook

Description

All prices

More details

Other editions

Additional editions

Persons

Content

Vectorizing IR

Getting ready

How to do it...

How it works.

System requirements