Introduction to LLVM Compiler Framework and Infrastructure, Lecture notes of Compiler Design

An overview of the LLVM Compiler Framework and Infrastructure, which provides reusable components for building compilers, reduces the time/cost to build a new compiler, and builds static compilers, JITs, trace-based optimizers, etc. the running example of arg promotion, which requires interprocedural analysis and alias analysis. It also covers the LLVM Virtual Instruction Set, Pass Manager, and important LLVM tools like opt, code generator, JIT, test suite, bugpoint. useful as study notes or lecture notes for a university course on compilers or programming languages.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

anjushri
anjushri šŸ‡ŗšŸ‡ø

4.8

(14)

243 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The LLVM Compiler Framework
and Infrastructure
(Part 1)
Presented by Gennady Pekhimenko
Substantial portions courtesy of Olatunji Ruwase,
Chris Lattner, Vikram Adve, and David Koes
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Introduction to LLVM Compiler Framework and Infrastructure and more Lecture notes Compiler Design in PDF only on Docsity!

The LLVM Compiler Framework

and Infrastructure

(Part 1)

Presented by Gennady Pekhimenko Substantial portions courtesy of Olatunji Ruwase, Chris Lattner, Vikram Adve, and David Koes

LLVM Compiler System

 The LLVM Compiler Infrastructure

 Provides reusable components for building compilers

 Reduce the time/cost to build a new compiler

 Build static compilers, JITs, trace-based optimizers, ...

 The LLVM Compiler Framework

 End-to-end compilers using the LLVM infrastructure

 C and C++ are robust and aggressive:

 Java, Scheme and others are in development

 Emit C code or native code for X86, Sparc, PowerPC

Tutorial Overview

 Introduction to the running example

 LLVM C/C++ Compiler Overview

 High-level view of an example LLVM compiler

 The LLVM Virtual Instruction Set

 IR overview and type-system

 The Pass Manager

 Important LLVM Tools

 opt, code generator, JIT, test suite, bugpoint

 Assignment Overview

Running example: arg promotion

Consider use of by-reference parameters:

int callee(const int &X) {

return X+1;

int caller() {

return callee(4);

int callee(const int *X) {

return *X+1; // memory load

int caller() {

int tmp; // stack object

tmp = 4; // memory store

return callee(&tmp);

compiles to

 Eliminated load in callee

 Eliminated store in caller

 Eliminated stack slot for ā€˜tmp’

int callee(int X) {

return X+1;

int caller() {

return callee(4);

We want:

Tutorial Overview

 Introduction to the running example

 LLVM C/C++ Compiler Overview

 High-level view of an example LLVM compiler

 The LLVM Virtual Instruction Set

 IR overview and type-system

 The Pass Manager

 Important LLVM Tools

 opt, code generator, JIT, test suite, bugpoint

 Assignment Overview

The LLVM C/C++ Compiler

 From the high level, it is a standard compiler:

 Compatible with standard makefiles

 Uses GCC 4.2 C and C++ parser

 Distinguishing features:

 Uses LLVM optimizers, not GCC optimizers

 .o files contain LLVM IR/bytecode, not machine code

 Executable can be bytecode (JIT’d) or machine code

llvmg++ -emit-llvm

C file llvmgcc -emit-llvm

C++ file

.o file

.o file

llvm linker executable

Compile Time Link Time

Looking into events at link-time

.o file

.o file

llvm linker executable

Native Code Backend Native executable

ā€œllcā€ C Code Backend

C Compiler Native executable ā€œllc –march=cā€ NOTE: Produces very ugly C. Officially deprecated, but still works fairly well.

ā€œgccā€

Link in native .o files and libraries here

LLVM

Linker

Link-time Optimizer .bc file for LLVM JIT

.o file

.o file

20 LLVM Analysis & Optimization Passes

Optionally ā€œinternalizesā€: marks most functions as internal, to improve IPO

Perfect place for argument promotion optimization! 10

Goals of the compiler design

 Analyze and optimize as early as possible:

 Compile-time opts reduce modify-rebuild-execute cycle

 Compile-time optimizations reduce work at link-time

(by shrinking the program)

 All IPA/IPO make an open-world assumption

 Thus, they all work on libraries and at compile-time

 ā€œInternalizeā€ pass enables ā€œwhole programā€ optzn

 One IR (without lowering) for analysis & optzn

 Compile-time optzns can be run at link-time too!

 The same IR is used as input to the JIT

IR design is the key to these goals!

Goals of LLVM IR

 Easy to produce, understand, and define!

 Language- and Target-Independent

 AST-level IR (e.g. ANDF, UNCOL) is not very feasible

 Every analysis/xform must know about ā€˜all’ languages

 One IR for analysis and optimization

 IR must be able to support aggressive IPO, loop opts,

scalar opts, … high- and low-level optimization!

 Optimize as much as early as possible

 Can’t postpone everything until link or runtime

 No lowering in the IR!

LLVM Instruction Set Overview

 Low-level and target-independent semantics

 RISC-like three address code

 Infinite virtual register set in SSA form

 Simple, low-level control flow constructs

 Load/store instructions with typed-pointers

 IR has text, binary, and in-memory forms

for (i = 0; i < N;

++i) Sum(&A[i], &P);

loop: ; preds = %bb0, %loop %i.1 = phi i32 [ 0, %bb0 ], [ %i.2, %loop ] %AiAddr = getelementptr float %A, i32 %i. call void @Sum(float %AiAddr, %pair %P) %i.2 = add i32 %i.1, 1 %exitcond = icmp eq i32 %i.1, %N br i1 %exitcond, label %outloop, label %loop** 14

LLVM Type System Details

 The entire type system consists of:

 Primitives: label, void, float, integer, …

 Arbitrary bitwidth integers (i1, i32, i64)

 Derived: pointer, array, structure, function

 No high-level types: type-system is language neutral!

 Type system allows arbitrary casts:

 Allows expressing weakly-typed languages, like C

 Front-ends can implement safe languages

 Also easy to define a type-safe subset of LLVM

See also: docs/LangRef.html

Lowering source-level types to LLVM

 Source language types are lowered:

 Rich type systems expanded to simple type system

 Implicit & abstract types are made explicit & concrete

 Examples of lowering:

 References turn into pointers: T&  T*

 Complex numbers: complex float  { float, float }

 Bitfields: struct X { int Y:4; int Z:2; }  { i32 }

 Inheritance: class T : S { int X; }  { S, i32 }

 Methods: class T { void foo(); }  void foo(T)*

 Same idea as lowering to machine code

Our example, compiled to LLVM

int callee(const int *X) {

return *X+1; // load

int caller() {

int T; // on stack

T = 4; // store

return callee(&T);

internal int %callee(int* %X) {

%tmp.1 = load int* %X

%tmp.2 = add int %tmp.1, 1

ret int %tmp.

int %caller() {

%T = alloca int

store int 4, int* %T

%tmp.3 = call int %callee(int* %T)

ret int %tmp.

Stack allocation is

explicit in LLVM

All loads/stores are

explicit in the LLVM

representation

Linker ā€œinternalizesā€

most functions in most

cases

Our example, desired transformation

internal int %callee(int %X.val) {

%tmp.2 = add int %X.val, 1

ret int %tmp.

int %caller() {

%T = alloca int

store int 4, int* %T

%tmp.1 = load int %T*

%tmp.3 = call int %callee(%tmp.1)

ret int %tmp.

internal int %callee(int* %X) {

%tmp.1 = load int* %X

%tmp.2 = add int %tmp.1, 1

ret int %tmp.

int %caller() {

%T = alloca int

store int 4, int* %T

%tmp.3 = call int %callee(int* %T)

ret int %tmp.

Change the prototype

for the function

Insert load instructions

into all callers

Update all call sites of

ā€˜callee’

Other transformation

(-mem2reg) cleans up

the rest

int %caller() {

%tmp.3 = call int %callee(int 4)

ret int %tmp.