




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of the LLVM Compiler Infrastructure and its components for building compilers. It explains how LLVM Compiler reduces the time and cost to build a new compiler and build different kinds of compilers. The document also discusses the LLVM Compiler Framework and its end-to-end compilers. It provides information on the LLVM Optimizer and its series of passes. The document also includes source code examples and visualizations of the LLVM Compiler System. useful for students studying compilers and programming languages.
Typology: Lecture notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Carnegie Mellon
Dominic Chen
Thanks to:
Vikram Adve, Jonathan Burket, Deby Katz,
David Koes, Chris Lattner, Gennady Pekhimenko,
and Olatunji Ruwase, for their slides
Carnegie Mellon
Provides reusable components for building compilers
Reduce the time/cost to build a new compiler
Build different kinds of compilers
Our homework assignments focus on static compilers
There are also JITs, trace-based optimizers, etc.
End-to-end compilers using the LLVM infrastructure
Support for C and C++ is robust and aggressive
Java, Scheme and others are in development
Emit C code or native code for x86, SPARC, PowerPC
Carnegie Mellon
Clang
(Front End)
Optimizer
Back End
Java
Source Code
Intermediate Form
LLVM IR x
Sparc
Object Code
Used to perform the same passes for all source and target languages
Carnegie Mellon
C Source Code
Clang AST
SelectionDAG
MachineInst
MCInst / Assembly
Clang Frontend
(clang)
Optimizer (opt)
Target-Independent
Code Generator
(llc) More
Architecture
Specific
More
Language
Specific
Carnegie Mellon
Read “Life of an instruction in LLVM”:
http://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm
Carnegie Mellon
TranslationUnitDecl 0xd8185a0 <
|-TypedefDecl 0xd818870 <
`-FunctionDecl 0xd8188e0 <example.c:1:1, line:5:1> line:1:5 main 'int ()'
`-CompoundStmt 0xd818a90 <col:12, line:5:1>
|-DeclStmt 0xd818998 <line:2:5, col:14>
| `-VarDecl 0xd818950 <col:5, col:13> col:9 used a 'int' cinit
| `-IntegerLiteral 0xd818980 <col:13> 'int' 5
|-DeclStmt 0xd818a08 <line:3:5, col:14>
| `-VarDecl 0xd8189c0 <col:5, col:13> col:9 used b 'int' cinit
| `-IntegerLiteral 0xd8189f0 <col:13> 'int' 3
`-ReturnStmt 0xd818a80 <line:4:5, col:16>
`-BinaryOperator 0xd818a68 <col:12, col:16> 'int' '-'
|-ImplicitCastExpr 0xd818a48 <col:12> 'int'
| `-DeclRefExpr 0xd818a18 <col:12> 'int' lvalue Var 0xd818950 'a' 'int'
`-ImplicitCastExpr 0xd818a58 <col:16> 'int'
`-DeclRefExpr 0xd818a30 <col:16> 'int' lvalue Var 0xd8189c0 'b' 'int'
Carnegie Mellon
CompoundStmt
DeclStmt DeclStmt ReturnStmt
IntegerLiteral IntegerLiteral BinaryOperator
ImplicitCastExpr ImplicitCastExpr
DeclRefExpr DeclRefExpr
Carnegie Mellon
In-Memory Data Structure
Bitcode (.bc files) Text Format (.ll files)
define i32 @main() #0 {
entry:
%retval = alloca i32, align 4
%a = alloca i32, align 4
42 43 C0 DE 21 0C 00 00
06 10 32 39 92 01 84 0C
0A 32 44 24 48 0A 90 21
18 00 00 00 98 00 00 00
E6 C6 21 1D E6 A1 1C DA
…
Bitcode files and LLVM IR text files are lossless serialization formats!
We can pause optimization and come back later.
llvm-dis
llvm-asm
Carnegie Mellon
C Source Code
Clang AST
SelectionDAG
MachineInst
MCInst / Assembly
Clang Frontend
(clang)
Optimizer (opt)
Target-Independent
Code Generator
(llc) More
Architecture
Specific
More
Language
Specific
Carnegie Mellon
.o file
.o file
Bitcode file for JIT
Native Executable
Performs Link-Time Optimizations
Carnegie Mellon
Carnegie Mellon
Low-level and target-independent semantics
RISC-like three address code
Infinite virtual register set in SSA form
Simple, low-level control flow constructs
Load/store instructions with typed-pointers
for (i = 0; i < N; i++)
Sum(&A[i], &P);
loop: ; preds = %bb0, %loop
%i.1 = phi i32 [ 0, %bb0 ], [ %i.2, %loop ]
%AiAddr = getelementptr float* %A, i32 %i.
call void @Sum(float %AiAddr, %pair* %P)
%i.2 = add i32 %i.1, 1
%exitcond = icmp eq i32 %i.1, %N
br i1 %exitcond, label %outloop, label %loop
Carnegie Mellon
High-level information exposed in the code
Explicit dataflow through SSA form (more later in the class)
Explicit control-flow graph (even for exceptions)
Explicit language-independent type-information
Explicit typed pointer arithmetic
Preserves array subscript and structure indexing
for (i = 0; i < N; i++)
Sum(&A[i], &P);
loop: ; preds = %bb0, %loop
%i.1 = phi i32 [ 0, %bb0 ], [ %i.2, %loop ]
%AiAddr = getelementptr float* %A, i32 %i.
call void @Sum(float %AiAddr, %pair* %P)
%i.2 = add i32 %i.1, 1
%exitcond = icmp eq i32 %i.1, %N
br i1 %exitcond, label %outloop, label %loop Nice syntax for calls is
preserved
Carnegie Mellon
Source language types are lowered:
Rich type systems expanded to simple types
Implicit & abstract types are made explicit & concrete
Examples of lowering:
Reference turn into pointers: T& -> T*
Complex numbers: complex fload -> {float, float}
Bitfields: struct X { int Y:4; int Z:2; } -> { i32 }
The entire type system consists of:
Primitives: label, void, float, integer, …
Arbitrary bitwidth integers (i1, i32, i64, i1942652)
Derived: pointer, array, structure, function (unions get turned into casts)
No high-level types
Type system allows arbitrary casts
Carnegie Mellon
clang
Explicit stack allocation
Explicit
Loads and
Stores
Explicit
Types
Carnegie Mellon
mem2reg
Carnegie Mellon
Module contains Functions and GlobalVariables
Module is a unit of analysis, compilation, and optimization
Function contains BasicBlocks and Arguments
Functions roughly correspond to functions in C
BasicBlock contains a list of Instructions
Each block ends in a control flow instruction
Instruction is an opcode + vector of operands
Carnegie Mellon
Module
Function Function Function
Function
Basic
Block
Basic
Block
Basic
Block
Basic Block
Instruction
Instruction Instruction
Traversal of the LLVM IR data structure
usually occurs through doubly-linked
lists
LLVM also supports
the Visitor Pattern
(more next time)
Carnegie Mellon
Compiler is organized as a series of “passes”:
Each pass is an analysis or transformation
Each pass can depend on results from previous passes
Six useful types of passes:
BasicBlockPass: iterate over basic blocks, in no particular order
CallGraphSCCPass: iterate over SCC’s, in bottom-up call graph order
FunctionPass: iterate over functions, in no particular order
LoopPass: iterate over loops, in reverse nested order
ModulePass: general interprocedural pass over a program
RegionPass: iterate over single-entry/exit regions, in reverse nested order
Passes have different constraints (e.g. FunctionPass):
FunctionPass can only look at the “current function”
Cannot maintain state across functions
Carnegie Mellon
Basic LLVM Tools
llvm-dis: Convert from .bc (IR binary) to .ll (human-readable IR text)
llvm-as: Convert from .ll (human-readable IR text) to .bc (IR binary)
opt: LLVM optimizer
llc: LLVM static compiler
lli: LLVM bitcode interpreter
llvm-link: LLVM bitcode linker
llvm-ar: LLVM archiver
Some Additional Tools
bugpoint - automatic test case reduction tool
llvm-extract - extract a function from an LLVM module
llvm-bcanalyzer - LLVM bitcode analyzer
FileCheck - Flexible pattern matching file verifier
tblgen - Target Description To C++ Code Generator
Carnegie Mellon
Invoke arbitrary sequence of passes :
Completely control PassManager from command line
Supports loading passes as plugins from *.so files
opt -load foo.so -pass1 -pass2 -pass3 x.bc -o y.bc
Passes “register” themselves:
When you write a pass, you must write the registration
RegisterPass
"15745: Function Information");