Understanding Object Files, ELF Format, and Libraries: Linking & Loading, Study notes of Computer Science

An in-depth look into the process of linking and loading executables, focusing on object files, the executable and linkable format (elf), and static and shared libraries. It covers topics such as relocatable object files, relocation information, and the role of the linker in resolving external references.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-d54
koofers-user-d54 🇺🇸

9 documents

1 / 48

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
– 1 – 15-213, F02
CS 201
Linking
Gerson Robboy
Portland State University
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30

Partial preview of the text

Download Understanding Object Files, ELF Format, and Libraries: Linking & Loading and more Study notes Computer Science in PDF only on Docsity!

CS 201

Linking

Gerson Robboy

Portland State University

A Simplistic Program Translation

Scheme

Problems:

  • Efficiency: small change requires complete recompilation
  • Modularity: hard to share common functions (e.g. printf )

Solution:

  • LInker

Translator

m.c

p

ASCII source file

Binary executable object file

(memory image on disk)

Translating the Example Program

Compiler driver Compiler driver coordinates all steps in the translationcoordinates all steps in the translation

and linking process. and linking process.

Included with each compilation system ( cc or gcc )

Invokes preprocessor ( cpp ), compiler ( cc1 ), assembler ( as ),

and linker ( ld ).

Passes command line arguments to appropriate phases

Example: create executable Example: create executable pp fromfrom m.cm.c andand a.ca.c ::

bass> gcc -O2 -v -o p m.c a.c

cpp [args] m.c /tmp/cca07630.i

cc1 /tmp/cca07630.i m.c -O2 [args] -o /tmp/cca07630.s

as [args] -o /tmp/cca076301.o /tmp/cca07630.s

ld -o p [system obj files] /tmp/cca076301.o /tmp/cca076302.o

bass>

A picture of the tool set

Linker (ld)

C compiler

m.c

m.

s

C compiler

a.c

a.o

p

assembler

m.

o

Libraries

libc.a

This is the executable program

assembler

a.

s

Why Linkers?

Modularity Modularity

Program can be written as a collection of smaller source

files, rather than one monolithic mass.

Can build libraries of common functions (more on this later)

e.g., Math library, standard C library

Efficiency Efficiency

Time:

Change one source file, compile, and then relink.

No need to recompile other source files.

Space:

Libraries of common functions can be aggregated into a single

file...

Yet executable files and running memory images contain only

code for the functions they actually use.

Questions for you

When a linker combines relocatable object files into an When a linker combines relocatable object files into an

executable file, why does the linker have to modify executable file, why does the linker have to modify

instructions in the actual code? instructions in the actual code?

How does the linker know what values to put into the How does the linker know what values to put into the

code? code?

How does the linker know exactly where to insert those How does the linker know exactly where to insert those

values? values?

ELF Object File Format

Elf header Elf header

Magic number, type (.o, exec, .so),

machine, byte ordering, etc.

Program header table Program header table

Page size, virtual addresses of memory

segments (sections), segment sizes.

.text .text (^) sectionsection

Code

.data .data (^) sectionsection

Initialized (static) data

. .bssbss sectionsection

Uninitialized (static) data

“Block Started by Symbol”

Has section header but occupies no

space in the disk file

ELF header

Program header table

(required for executables)

.text section

.data section

.bss section

.symtab

.rel.txt

.rel.data

.debug

Section header table

(required for relocatables)

0

ELF Object File Format (cont)

. .symtabsymtab sectionsection

Symbol table

Procedure and static variable names

Section names and locations

. .relrel.text.text sectionsection

Relocation info for .text section

Addresses of instructions that will need to

be modified in the executable

Instructions for modifying.

. .relrel.data.data (^) sectionsection

Relocation info for .data section

Addresses of pointer data that will need to

be modified in the merged executable

.debug .debug sectionsection

Info for symbolic debugging ( gcc -g )

ELF header

Program header table

(required for executables)

.text section

.data section

.bss section

.symtab

.rel.text

.rel.data

.debug

Section header table

(required for relocatables)

0

Merging Relocatable Object Files

into an Executable Object File

main()

m.o

int *ep = &e

a()

a.o

int e = 7

headers

main()

a()

system code

int *ep = &e

int e = 7

system data

more system code

int x = 15

int y

system data

int x = 15

Relocatable Object Files (^) Executable Object File

.text

.text

.data

.text

.data

.text

.data

.bss

.symtab

.debug

.data

uninitialized data .bss

system code

Relocating Symbols and Resolving

External References

 Symbols are lexical entities that name functions and variables.

 Each symbol has a value (typically a memory address).

 Code consists of symbol definitions and references.

 References can be either local or external.

int e=7;

int main() {

int r = a();

exit(0);

m.c a.c

extern int e;

int *ep=&e;

int x=15;

int y;

int a() {

return *ep+x+y;

Def of local

symbol e

Ref to external

symbol exit

(defined in

libc.so )

Ref to

external

symbol e

Def of

local

symbol

ep

Defs of

local

symbols

x and y

Refs of local

symbols ep,x,y

Def of

local

symbol a

Ref to external

symbol a

External functions

In main, notice that the names In main, notice that the names a a andandexit exit are externalare external

symbols. symbols.

The compiler knows they are functions, and the linker The compiler knows they are functions, and the linker

will resolve the references. will resolve the references.

Exit is just another function call Exit is just another function call

The compiler doesn’t know anything about Unix system calls

The compiler knows about names and data types

m.o Relocation Info

Disassembly of section .text:

00000000

: 00000000
:

0: 55 pushl %ebp

1: 89 e5 movl %esp,%ebp

3: e8 fc ff ff ff call 4 <main+0x4>

4: R_386_PC32 a

8: 6a 00 pushl $0x

a: e8 fc ff ff ff call b <main+0xb>

b: R_386_PC32 exit

f: 90 nop

Disassembly of section .data:

00000000 :

0: 07 00 00 00

source: objdump

int e=7;

int main() {

int r = a();

exit(0);

m.c

Question

On the previous slide, the variables ep, x, and y are On the previous slide, the variables ep, x, and y are

local in the same source file. local in the same source file.

So why can So why can’’t the compiler just generate completedt the compiler just generate completed

code? Why is relocation information necessary? code? Why is relocation information necessary?

a.o Relocation Info (. data )

a.c

extern int e;

int *ep=&e;

int x=15;

int y;

int a() {

return *ep+x+y;

Disassembly of section .data:

00000000 :

0: 00 00 00 00

0: R_386_32 e

00000004 :

4: 0f 00 00 00