CS61C Course Notes, Study notes of Compiler Design

A set of course notes for CS61C, a computer science course that covers topics such as binary representation, C programming, memory management, RISC-V, synchronous digital systems, datapath, memory hierarchy, and parallelism. The notes cover topics such as base conversions, numeric representations, floating-point representation, C features, bitwise operators, pointers, arrays, structs, memory basics, the stack, the heap, heap management, assembly languages, RISC-V structure, instruction formats, compiling, assembling, linking, loading, registers, pipelining, combinational logic elements, timing, caching, virtual memory, data level parallelism, and thread level parallelism.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

tomseller
tomseller 🇺🇸

4.6

(16)

271 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS61C Course Notes
Anmol Parande
Fall 2019 - Professors Dan Garcia and Miki Lustig
Contents
1 Binary Representation 3
1.1 Baseconversions ........................... 3
1.2 Numeric Representations . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Sign and Magnitude . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 One’s Complement . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Two’s Complement . . . . . . . . . . . . . . . . . . . . . . 4
1.2.4 BiasEncoding ........................ 4
1.3 Floating Point Representation . . . . . . . . . . . . . . . . . . . . 4
2 C 5
2.1 CFeatures............................... 5
2.2 BitwiseOperators .......................... 5
2.3 Pointers................................ 6
2.3.1 Pointer Operators . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Arrays................................. 6
2.5 Structs................................. 7
2.5.1 Struct Operators . . . . . . . . . . . . . . . . . . . . . . . 7
3 Memory Management 7
3.1 MemoryBasics ............................ 7
3.2 TheStack............................... 8
3.3 TheHeap ............................... 8
3.4 HeapManagement .......................... 9
4 RISC-V 9
4.1 Basics of Assembly Languages . . . . . . . . . . . . . . . . . . . . 9
4.2 RISC-VStructure .......................... 10
4.2.1 Caller Callee Convention . . . . . . . . . . . . . . . . . . 10
4.2.2 Directives........................... 11
4.3 Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.1 Addressing .......................... 13
4.4 Compiling, Assembling, Linking, Loading . . . . . . . . . . . . . 13
4.4.1 Compiler ........................... 13
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download CS61C Course Notes and more Study notes Compiler Design in PDF only on Docsity!

CS61C Course Notes

Anmol Parande

Fall 2019 - Professors Dan Garcia and Miki Lustig

  • 1 Binary Representation Contents
    • 1.1 Base conversions
    • 1.2 Numeric Representations
      • 1.2.1 Sign and Magnitude
      • 1.2.2 One’s Complement
      • 1.2.3 Two’s Complement
      • 1.2.4 Bias Encoding
    • 1.3 Floating Point Representation
  • 2 C
    • 2.1 C Features
    • 2.2 Bitwise Operators
    • 2.3 Pointers
      • 2.3.1 Pointer Operators
    • 2.4 Arrays
    • 2.5 Structs
      • 2.5.1 Struct Operators
  • 3 Memory Management
    • 3.1 Memory Basics
    • 3.2 The Stack
    • 3.3 The Heap
    • 3.4 Heap Management
  • 4 RISC-V
    • 4.1 Basics of Assembly Languages
    • 4.2 RISC-V Structure
      • 4.2.1 Caller Callee Convention
      • 4.2.2 Directives
    • 4.3 Instruction Formats
      • 4.3.1 Addressing
    • 4.4 Compiling, Assembling, Linking, Loading
      • 4.4.1 Compiler
      • 4.4.2 Assembler
    • 4.5 Linker
    • 4.6 Loader
  • 5 Synchronous Digital Systems
    • 5.1 Registers
      • 5.1.1 Pipelining
    • 5.2 Combinational Logic Elements
    • 5.3 Timing
  • 6 Datapath
    • 6.1 Pipelined Datapath
      • 6.1.1 Structural Hazards
      • 6.1.2 Data Hazards
      • 6.1.3 Control Hazards
  • 7 The Memory Hierarchy
    • 7.1 Caching
      • 7.1.1 Direct Mapped Cache
      • 7.1.2 Fully Associative Cache
      • 7.1.3 N-Way Set Associative Cache
      • 7.1.4 Cache Metrics
      • 7.1.5 Writing to the Cache
      • 7.1.6 Block Replacement
    • 7.2 Virtual Memory
      • 7.2.1 Memory Access
      • 7.2.2 Virtual Memory and the Disk
      • 7.2.3 Virtual Memory Performance Metrics
  • 8 Parallelism
    • 8.1 Data Level Parallelism and SIMD
    • 8.2 Thread Level Parallelism
      • 8.2.1 Threads
      • 8.2.2 Synchronization
      • 8.2.3 Cache Coherence
      • 8.2.4 OpenMP
      • 8.2.5 Amdahls Law
  • 9 Input/Output

1.2.3 Two’s Complement

To convert from decimal to two’s complement: If the number is positive, convert to binary as normal. If the number is Negative:

  1. Convert the positive version of the number to binary
  2. Invert each bit
  3. Add one to the result

If given a two’s complement number, you can find -1 times that number by following the same process (invert bits and add 1).

1.2.4 Bias Encoding

With bias encoding, the number is equal to its unsigned representation plus a bias turn. With a negative bias, we can center a positive range of 0 → 2 N^ − 1 on 0.

1.3 Floating Point Representation

To represent decimal numbers in binary, we use floating point representation. Any number in scientific notation has the following components

  • Mantissa: The number in front of the point
  • Significand: The digits after the point
  • Radix: The base of the number
  • Exponent: How many times the point should be shifted to recover the original number

︸︷︷︸^1

Mantissa

︸︷︷︸^.

Binary Point

︸︷︷︸^01

significand

exponent ︷︸︸︷ − 1 ︸ ︷︷ ︸ radix

If we have a 32 bit system, then by the floating point convention:

︸︷︷︸^0

Sign Bit

Exponent Bits ︷ ︸︸ ︷ 00000000

Significand (^00000000000000000000000) ︸ ︷︷ ︸

In the floating point representation, the manissa is always one because we are using scientific notation, so we don’t bother storing it.

  • Sign Bit: Determines the sign of the floating point number
  • Exponent: 8 bit biased number (-127 bias)
  • Significand: 23 significand bits representing 2−^1 , 2 −^2 ...

Given a number in floating point representation, we can convert it back to decimal by applying the following formula:

n = (−1)s(1 + signif icand) · 2 exponent−^127

Notice a couple things about this representation.

  • If the exponent is larger than 8 bits, then overflow will occur
  • If a negative exponent is more than 8 bits, then underflow occurs
  • There are 2 0’s. Positive 0 and negative 0

Because of the way that floating point is built, certain sequences are designated to be specific numbers

Exponent Significand Object 0 0 0 0 nonzero denorm 1-254 anything ± # 255 0 ±∞ 255 nonzero NaN

A denormed number is a number where we don’t have an implicit 1 as the mantissa. These numbers let us represent incredibly small numbers. They have an implicit exponent of −126.

2 C

2.1 C Features

  • C is a compiled language =⇒ executables are rebuilt for each new system
  • Every variable holds garbage until initialization
  • Function parameters are pass by value

2.2 Bitwise Operators

Bitwise operators are operators which change the bits of integer-like objects (ints, chars, etc)

  • &: Bitwise AND. Useful for creating masks
  • |: Bitwise OR. Useful for flipping bits on
  • ∧: Bitwise XOR. Useful for flipping bits off
  • <<: Left shifts the bits of the first operand left by the number of bits specified by the second operand.
  • : Right shifts the bits of the first operand right by the number of bits specified by the second operand.

  • ∼: Inverts the bits.

lengths and they do not check their bounds. This means whenever you are passing an array to a function, you should always be passing its size as well.

2.5 Structs

Structs are the basic datastructures in C. Like classes, they are composed of simpler data structures, but there is no inheritance.

2.5.1 Struct Operators

  • − >: dereference a struct and get a subfield

typedef can be a useful command with structs because it lets us name them cleanly.

3 Memory Management

3.1 Memory Basics

In memory, a word is 4 bytes. When objects are saved to memory, they are saved by words. How the words are ordered depends on the type of system. In Little Endian systems, the Least Significant Byte is placed at the lowest mem- ory address. In other words, the memory address points to the least significant byte

The opposite is true in Big Endian systems. For example, lets say we have the number 0x12345678 stored at the memory address 0x 00

0 x 00 0 x 04 0 x 08 0 x 0 C Little Endian: 78 56 34 12 Big Endian: 12 34 56 78

There are four sections of memory: the stack, the heap, static, and code.

Definition 5 The Stack is where local variables are stored. This includes pa- rameters and return addresses. It is the ”highest” level of memory and grows ”downward” towards heap memory.

Definition 6 The Heap is where dynamic memory is stored. Data lives here until the programmer deallocates it. It sits below the stack and grows upwards to toward it.

Definition 7 Static storage is where global variables are stored. This storage does not change sizes and is mostly permanent.

Definition 8 Code storage is where the ”code” is located. This includes pre- processing instructions and function calls.

Definition 9 Stackoverflow is when the stack grows so large that it intersects with heap memory. This is mostly unavoidable.

Definition 10 Heap pileup is when the heap grows so large that it starts to intersect with the stack. This is very avoidable because the programmer manages it.

Definition 11 All of the memory which a program uses is collectively referred to as the address space of the program.

3.2 The Stack

The stack is named that way because every time a function call is made, a stack frame is created. A stack frame includes the address of the return instruction and the parameters to the function. As the function executes, local variables are added to the frame. When the function returns, the frame is popped off. In this way, frames are handled in Last-In-First-Out (LIFO) order.

Definition 12 The stack pointer is a pointer which points to the current stack frame.

Important: Deallocated memory is not cleared. It is merely overwritten later.

3.3 The Heap

The heap is a larger pool of memory than the stack and it is not in contiguous order. Back to back allocations to the heap may be very far apart.

Definition 13 Heap fragmentation is when most of the free memory in the heap is in many small chunks

Fragmentation is bad because if we want to allocate space for a large object, we may have enough cumulative space on the heap, but if none of the remaining contiguous spaces are open, then there is no way to create our object.

Implementation:

Every block in the heap has a header containing its size and a pointer to the next block. The free blocks of memory are stored as a circular linked list. When memory needs to be allocated to the heap, this linked list is searched. When memory is freed, adjacent empty blocks are coalesced into a single larger block. There are three different strategies which can be used to do this allocation/free- ing.

Definition 14 Best-fit allocation is where the entire linked list is searched to find the smallest block large enough to fit the requirements

Definition 15 First-fit allocation is where the first block that is large enough to fit the requirement is returned

the value 0. Unlike variables, registers have no types. The operation is what determines what the content of the register is treated as.

Immediates are numerical constants. They can also be used as the operands of assembly intructions.

4.2 RISC-V Structure

The general format of a RISC-V instruction is

o p e r a t i o n n a m e d e s t i n a t i o n s o u r c e 1 s o u r c e 2

Labels are text in the program which denote certain locations in code. Branches change the flow of the program, usually by jumping to a label or an address in the code portion of memory.

Pseudo-instructions are instructions which are translated into different in- structions by the assembler. They exist because they increase readability of the program.

In order to increase program legibility, labels are hardly referred to by their number (x15, x20, etc). Instead, they have symbolic names. Here are a few.

  • a0 - a7: The argument registers
  • s0 - s7: The saved registers
  • ra: return address register
  • sp: stack pointer register
  • pc: Program Counter

4.2.1 Caller Callee Convention

Because functions can always overwrite registers, programmers set up conven- tions for calling and returning from functions. The Caller is the function which calls another function. The Callee is a function which is being called.

Functional Control Flow

  1. Put paramters where the function can access them (a0-a7)
  2. Transfer control to the function (jump)
  3. Acquire the local storage resources for function (Increase stack)
  4. Perform the desired task of the function
  5. Put result in a0 where the calling function can access it
  1. Release local variables and return data to used registers so the caller can access them (Decrease stack)
  2. Return control to the calling function (Jump to ra)

Because every function must have this control flow, the caller-callee convention was set up as follows

  1. sp, gp, tp, s0-s11 are preserved across a function call
  2. t0-t7, a0-a7 are not preserved across a function call

If a register is not preserved across a function call, then the caller cannot expect its value to be the same after the callee returns. In order to preserve registers across a function call, we use the stack. The Stack Frame stores the variables which need to be saved in order to adhere to caller callee convention. Every RISC-V function has a Prologue where it saves the neccessary registers to the stack. An example might look like add i sp , sp , − 16 sw s0 , 0 ( sp ) sw s1 , 4 ( sp ) sw s2 , 8 ( sp ) sw ra , 1 2( sp ) This function must use the s0-s3 saved registers. We first create the stack frame by decrementing the stack pointer. Then we save the saved registers to the newly allocated memory. This function must be calling another function, so it has to remember its return address. That is why we save it to stack. The Epilogue is the part of the function before it returns where everything on the stack is put back. lw s0 , 0 ( sp ) lw s1 , 4 ( sp ) lw s2 , 8 ( sp ) lw ra , 12 ( sp ) j r r a

4.2.2 Directives Directives are special demarcations in an assembly file which designated different pieces of data.

.text: Subsequent items are put into the text segment of memory (i.e the code)

.data: Subsequent items are put into the data segment of memory (i.e static variables)

.global sym: Declares a symbol to be global, meaning it can be referenced from other files

Imm[31:12] rd opcode

J-Type

31 30:21 20 19:12 11:7 6: Imm[20] Imm[10:1] Imm[11] Imm[19:12] rd opcode

4.3.1 Addressing

Notice that the J-Type and B-Type instructions require use a label in code. These labels are encoded in the instruction format as an offset from the program counter. This is known as PC Relative Addressing. Since each instruction in RISC-V is 1 word, we will never have an odd address. As a result, we don’t need to store the last bit of the immediate in B-Type and J-Type instructions because it is automatically 0.

4.4 Compiling, Assembling, Linking, Loading

Compiling, Asssembling, Linking, and Loading (CALL) are the four steps of loading a program.

4.4.1 Compiler

The input to the compiler is a file written in a high level programming language. It outputs assembly language code which is built for the machine the code was compiled on. The output of the compile may still include pseudo-instructions in assembly.

4.4.2 Assembler

The Assembler is the program which converts assembly language code to ma- chine language code. It reads and uses directives, replaces psuedo-instructions with their real equivalent, and produces machine language code (i.e bits) where it can.

The output of the assembler is an object file. The object file contains the following elements:

  • Object file header: The Size and position of the different sections of the object files
  • Text Segment: The code
  • Data Segment: binary representation of the static data in the source
  • Relocation Table: A special data structure which contains the lines of code needing to be fixed
  • Symbol Table: A special data structure which lists the files global labels and static data labels
  • Debugging information

The symbol table contains information which is public to other files in the program.

  • Global function labels
  • Data Labels

The Relocation Table contains information that needs to be relocated in later steps. It essentially tracks everything that the Assembler cannot directly convert to machine code immediately because it doesn’t have enough information.

  • List of labels this file doesn’t know about
  • List of absolute labels that are jumped took
  • Any piece of data in the static section

When the assembler parses a file, instructions which don’t have a label are con- verted into machine language. When a label is defined, it’s position is stored in the relocation table. When a label is encountered in code, the assembler looks to see if it’s position was defined in the relocation table. If it is found, the label is replaced with the immediate and converted to machine code. Otherwise, the line is marked for relocation.

In order to do its job, the assembler must take two passes through the code. This is because of the forward reference problem. If a label is used before it is defined in the same file, the first time the assembler encounters it, it won’t know how to convert it to machine code. To resolve this, the assembler simply takes two passes so it finds all labels in the first pass and convert the lines it originally couldn’t in the second pass.

4.5 Linker

The Linker is responsible for linking all of the object files together and producing the final executable. The linker operates in three steps:

  1. Put the text segments together
  2. Put the data segments together
  3. Resolve any referencing issues

After the linking step, the entire program is finally in pure machine code because all references to labels must be resolved. The linker knows the length of each text and data segment. It can use these lengths to order the segments appropriately. It assumes that the first word will be stored at 0x10000000 and can calculate the absolute address of each word from there. To resolve references, it uses the relocation table to change addresses to their appropriate values.

5.1 Registers

Registers are state elements frequently used in circuits. On the rising edge of the clock, the input d is sampled and transferred the output q. At all other times, the input d is ignored. There are three critical timing requirements which are specific to registers

  1. Setup Time: How long the input must be stable before the rising edge of the clock for the register to properly read it
  2. Hold Time: How long the input must be stable after the rising edge of the clock for the register to properly read it
  3. Clock-to-Q Time: How long after the rising edge of the clock it takes for input to appear at the registers output

5.1.1 Pipelining

One place where registerrs become useful is in pipelining. Pipelining a circuit means placing registers after combinational logic blocks. This stops delay times from adding up because now intermediate quantities must be stored in the register before being passed to the next block. This allows for higher clock frequencies because we are no longer limited by the logic delays, only by our registers.

5.2 Combinational Logic Elements

x (^) ¯x

Not Gate

x

y

x · y

And Gate

x

y

x + y

Or Gate

x

y

(x + y) · (x · y)

Xor Gate

x

y

(x · y)

Nand Gate

x

y

(x · y)

Nor Gate

These are the basic gates which are used in combinational logic. Using Boolean Algebra, we can simplify complicated logic statements/circuits.

x · x¯ = 0 x + ¯x = 1 x · 0 = 0 x + 1 = 1 x · 1 = x x + 0 = x x · x = x x + x = x x · y = y · x x + y = y + x (xy)z = x(yz) (x + y) + z = x + (y + z) x(y + z) = xy + xz x + yz = (x + y)(x + z) xy + x = x (x + y)x = x xy = ¯x + ¯y x + y = ¯x¯y

One other important circuit element is the data multiplexor (mux). A mux takes in n different sources of data and selects one of those sources to output. It does this using select inputs. For example, if a mux has 3 select bits, then it can choose from 8 different streams of data.

5.3 Timing

In circuits, timing is incredibly importat because combinational logic blocks have propagation delays (i.e the output does not change instantaneously with the input). When used in conjuction with registers (which what their own timing requirements), if one is not careful, we could build a circuit which produces inaccurate outputs. When checking whether or not a circuit will do what it is designed to do, we need to look at two special paths:

  • Longest CL Path: The longest path of combinational logic blocks between two registers
  • Shortest CL Path: The shortest path of combinational logic blocks be- tween two registers.

This will let us analyze things like the maximum clock rate we can achieve or the maximum hold time our registers can have.

tdelay = tsetup + tclk 2 q + tlongestCL

This first formula tells us the maximum amount of time it takes for an input to propagate from a register to another register. When the clock rises, the first register will read the input and output it tclk 2 q later. The input will then go through the combinational logic elements. Since we are looking for the maxi- mum, we only consider the longest CL path. Finally, the output needs to be stable for tsetup in order for the destination register to read it properly. As long as our clock period is longer than the max delay, our circuit will work properly.

thold = tclk 2 q + tshortestCL

This second formula tells us the maximum hold time which our registers can have. When the clock rises, the first register reads the input and outputs it tclk 2 q later. The input will then go through the combinational logic elements.

6.1 Pipelined Datapath

A single-cycle datapath is one where every instruction passes through the dat- apath one at a time. However, this is inefficient because faster stages (such as register reading) are left unused while waiting for slower stages (such as memory reading). One way to fix this is to pipeline the datapath. This allows multiple instructions to use different parts of the datapath at once, speeding up the pro- cessor because no stage is left unutilized. All we need to do is add registers after each datapath stage. However, this introduces ”hazards” into the processor.

6.1.1 Structural Hazards

Definition 17 A structural hazard is when two or more instructions compete for access to the same physical resource.

One solution is to have the instructions take turns to access the resource. The other option is to add more hardware to distribute that resource. For example, a Regfile structural hazard would be when the processor needs to read registers for one instruction and write a register for another. This is solved by giving the Regfile two independent read ports and one independent write port. Another example is a memory structural hazard where instructional memory and data memory are used simultaneously. This is solved by separating them into IMem and DMem.

6.1.2 Data Hazards

Definition 18 A data hazard is when two instructions have a data dependency between them

Problem: One instruction is reading from a register that a previous instruc- tion is writing back to Solution: The WB stage will always update the value first before Instruction Decode reads a new value

Problem: The result from the ALU will take 2 cycles to be written back. An instruction might need it before then Solution 1: We could stall by introducing no-ops (instructions that do noth- ing), but this would kill performance. Solution 2: WE can add a loop from the output of the ALU to the ALU input via a mux and add a control element to determine when we should use the forwarded value

Problem: Suppose a load instruction is followed by an instrucction that requires the data loaded from DMEM. The ALU thus needs the data as it is being read. This slot after a load is known as the load delay slot. If the load delay slot instruction uses the result of the load, then we need to stall for a cycle Solution 1: We could insert a no-op into the code

Solution 2: Turn off all write enables and run the second instruction twice Solution 3: Reorder instructions so that the load delay slot doesn’t use the loaded result

6.1.3 Control Hazards

Definition 19 A control hazard is when the program flow changes so instruc- tions in the pipeline are no longer relevant

Notice that this is only a problem when a branch is taken because it means the pipeline must be flushed to get the wrong instructions out of the pipeline (by converting them to no-ops). A more advanced way to fix this is to implement branch prediction which is where the processor attempts to predict whether or not a branch is taken and load instructions based on that.

7 The Memory Hierarchy

Computers have several levels of memory, each one serving different purposes. On the processor itself, we have the Regfile which provides fast read and write access. Right below it, we have the caches as welll as main memory (a.k.a DRAM) which has more capacity than registers but is much slower. After DRAM, we have Disk, which is even larger but incredibly slow. Different systems control the transfer of data between different parts of memory.

  • Registers ↔ Memory: The compiler/assembly programmer
  • Cache ↔ Main Memory: Cache Controller
  • Main Memory ↔ Disk: Operating System

7.1 Caching

The closer memory is to the processor, the faster it is. However, that also makes it more expensive. The idea of caching is to copy a subset of main memory and keep it close to the processor. We can then stack multiple layers of caches (each one with more storage than the other) before we actually need to read DRAM.

Definition 20 Temporal locality is the idea that if data is used now, it will be used later on

Definition 21 Spatial locality is the idea that if data is used now, nearby data will likely be used later

Caches take advantage of both temporal and spatial locality to provide quick memory access.