





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A set of course notes for CS61C, a computer science course that covers topics such as binary representation, C programming, memory management, RISC-V, synchronous digital systems, datapath, memory hierarchy, and parallelism. The notes cover topics such as base conversions, numeric representations, floating-point representation, C features, bitwise operators, pointers, arrays, structs, memory basics, the stack, the heap, heap management, assembly languages, RISC-V structure, instruction formats, compiling, assembling, linking, loading, registers, pipelining, combinational logic elements, timing, caching, virtual memory, data level parallelism, and thread level parallelism.
Typology: Study notes
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















1.2.3 Two’s Complement
To convert from decimal to two’s complement: If the number is positive, convert to binary as normal. If the number is Negative:
If given a two’s complement number, you can find -1 times that number by following the same process (invert bits and add 1).
1.2.4 Bias Encoding
With bias encoding, the number is equal to its unsigned representation plus a bias turn. With a negative bias, we can center a positive range of 0 → 2 N^ − 1 on 0.
To represent decimal numbers in binary, we use floating point representation. Any number in scientific notation has the following components
Mantissa
Binary Point
significand
exponent ︷︸︸︷ − 1 ︸ ︷︷ ︸ radix
If we have a 32 bit system, then by the floating point convention:
Sign Bit
Exponent Bits ︷ ︸︸ ︷ 00000000
Significand (^00000000000000000000000) ︸ ︷︷ ︸
In the floating point representation, the manissa is always one because we are using scientific notation, so we don’t bother storing it.
Given a number in floating point representation, we can convert it back to decimal by applying the following formula:
n = (−1)s(1 + signif icand) · 2 exponent−^127
Notice a couple things about this representation.
Because of the way that floating point is built, certain sequences are designated to be specific numbers
Exponent Significand Object 0 0 0 0 nonzero denorm 1-254 anything ± # 255 0 ±∞ 255 nonzero NaN
A denormed number is a number where we don’t have an implicit 1 as the mantissa. These numbers let us represent incredibly small numbers. They have an implicit exponent of −126.
2 C
Bitwise operators are operators which change the bits of integer-like objects (ints, chars, etc)
: Right shifts the bits of the first operand right by the number of bits specified by the second operand.
lengths and they do not check their bounds. This means whenever you are passing an array to a function, you should always be passing its size as well.
Structs are the basic datastructures in C. Like classes, they are composed of simpler data structures, but there is no inheritance.
2.5.1 Struct Operators
typedef can be a useful command with structs because it lets us name them cleanly.
3 Memory Management
In memory, a word is 4 bytes. When objects are saved to memory, they are saved by words. How the words are ordered depends on the type of system. In Little Endian systems, the Least Significant Byte is placed at the lowest mem- ory address. In other words, the memory address points to the least significant byte
The opposite is true in Big Endian systems. For example, lets say we have the number 0x12345678 stored at the memory address 0x 00
0 x 00 0 x 04 0 x 08 0 x 0 C Little Endian: 78 56 34 12 Big Endian: 12 34 56 78
There are four sections of memory: the stack, the heap, static, and code.
Definition 5 The Stack is where local variables are stored. This includes pa- rameters and return addresses. It is the ”highest” level of memory and grows ”downward” towards heap memory.
Definition 6 The Heap is where dynamic memory is stored. Data lives here until the programmer deallocates it. It sits below the stack and grows upwards to toward it.
Definition 7 Static storage is where global variables are stored. This storage does not change sizes and is mostly permanent.
Definition 8 Code storage is where the ”code” is located. This includes pre- processing instructions and function calls.
Definition 9 Stackoverflow is when the stack grows so large that it intersects with heap memory. This is mostly unavoidable.
Definition 10 Heap pileup is when the heap grows so large that it starts to intersect with the stack. This is very avoidable because the programmer manages it.
Definition 11 All of the memory which a program uses is collectively referred to as the address space of the program.
The stack is named that way because every time a function call is made, a stack frame is created. A stack frame includes the address of the return instruction and the parameters to the function. As the function executes, local variables are added to the frame. When the function returns, the frame is popped off. In this way, frames are handled in Last-In-First-Out (LIFO) order.
Definition 12 The stack pointer is a pointer which points to the current stack frame.
Important: Deallocated memory is not cleared. It is merely overwritten later.
The heap is a larger pool of memory than the stack and it is not in contiguous order. Back to back allocations to the heap may be very far apart.
Definition 13 Heap fragmentation is when most of the free memory in the heap is in many small chunks
Fragmentation is bad because if we want to allocate space for a large object, we may have enough cumulative space on the heap, but if none of the remaining contiguous spaces are open, then there is no way to create our object.
Implementation:
Every block in the heap has a header containing its size and a pointer to the next block. The free blocks of memory are stored as a circular linked list. When memory needs to be allocated to the heap, this linked list is searched. When memory is freed, adjacent empty blocks are coalesced into a single larger block. There are three different strategies which can be used to do this allocation/free- ing.
Definition 14 Best-fit allocation is where the entire linked list is searched to find the smallest block large enough to fit the requirements
Definition 15 First-fit allocation is where the first block that is large enough to fit the requirement is returned
the value 0. Unlike variables, registers have no types. The operation is what determines what the content of the register is treated as.
Immediates are numerical constants. They can also be used as the operands of assembly intructions.
The general format of a RISC-V instruction is
o p e r a t i o n n a m e d e s t i n a t i o n s o u r c e 1 s o u r c e 2
Labels are text in the program which denote certain locations in code. Branches change the flow of the program, usually by jumping to a label or an address in the code portion of memory.
Pseudo-instructions are instructions which are translated into different in- structions by the assembler. They exist because they increase readability of the program.
In order to increase program legibility, labels are hardly referred to by their number (x15, x20, etc). Instead, they have symbolic names. Here are a few.
4.2.1 Caller Callee Convention
Because functions can always overwrite registers, programmers set up conven- tions for calling and returning from functions. The Caller is the function which calls another function. The Callee is a function which is being called.
Functional Control Flow
Because every function must have this control flow, the caller-callee convention was set up as follows
If a register is not preserved across a function call, then the caller cannot expect its value to be the same after the callee returns. In order to preserve registers across a function call, we use the stack. The Stack Frame stores the variables which need to be saved in order to adhere to caller callee convention. Every RISC-V function has a Prologue where it saves the neccessary registers to the stack. An example might look like add i sp , sp , − 16 sw s0 , 0 ( sp ) sw s1 , 4 ( sp ) sw s2 , 8 ( sp ) sw ra , 1 2( sp ) This function must use the s0-s3 saved registers. We first create the stack frame by decrementing the stack pointer. Then we save the saved registers to the newly allocated memory. This function must be calling another function, so it has to remember its return address. That is why we save it to stack. The Epilogue is the part of the function before it returns where everything on the stack is put back. lw s0 , 0 ( sp ) lw s1 , 4 ( sp ) lw s2 , 8 ( sp ) lw ra , 12 ( sp ) j r r a
4.2.2 Directives Directives are special demarcations in an assembly file which designated different pieces of data.
.text: Subsequent items are put into the text segment of memory (i.e the code)
.data: Subsequent items are put into the data segment of memory (i.e static variables)
.global sym: Declares a symbol to be global, meaning it can be referenced from other files
Imm[31:12] rd opcode
J-Type
31 30:21 20 19:12 11:7 6: Imm[20] Imm[10:1] Imm[11] Imm[19:12] rd opcode
4.3.1 Addressing
Notice that the J-Type and B-Type instructions require use a label in code. These labels are encoded in the instruction format as an offset from the program counter. This is known as PC Relative Addressing. Since each instruction in RISC-V is 1 word, we will never have an odd address. As a result, we don’t need to store the last bit of the immediate in B-Type and J-Type instructions because it is automatically 0.
Compiling, Asssembling, Linking, and Loading (CALL) are the four steps of loading a program.
4.4.1 Compiler
The input to the compiler is a file written in a high level programming language. It outputs assembly language code which is built for the machine the code was compiled on. The output of the compile may still include pseudo-instructions in assembly.
4.4.2 Assembler
The Assembler is the program which converts assembly language code to ma- chine language code. It reads and uses directives, replaces psuedo-instructions with their real equivalent, and produces machine language code (i.e bits) where it can.
The output of the assembler is an object file. The object file contains the following elements:
The symbol table contains information which is public to other files in the program.
The Relocation Table contains information that needs to be relocated in later steps. It essentially tracks everything that the Assembler cannot directly convert to machine code immediately because it doesn’t have enough information.
When the assembler parses a file, instructions which don’t have a label are con- verted into machine language. When a label is defined, it’s position is stored in the relocation table. When a label is encountered in code, the assembler looks to see if it’s position was defined in the relocation table. If it is found, the label is replaced with the immediate and converted to machine code. Otherwise, the line is marked for relocation.
In order to do its job, the assembler must take two passes through the code. This is because of the forward reference problem. If a label is used before it is defined in the same file, the first time the assembler encounters it, it won’t know how to convert it to machine code. To resolve this, the assembler simply takes two passes so it finds all labels in the first pass and convert the lines it originally couldn’t in the second pass.
The Linker is responsible for linking all of the object files together and producing the final executable. The linker operates in three steps:
After the linking step, the entire program is finally in pure machine code because all references to labels must be resolved. The linker knows the length of each text and data segment. It can use these lengths to order the segments appropriately. It assumes that the first word will be stored at 0x10000000 and can calculate the absolute address of each word from there. To resolve references, it uses the relocation table to change addresses to their appropriate values.
Registers are state elements frequently used in circuits. On the rising edge of the clock, the input d is sampled and transferred the output q. At all other times, the input d is ignored. There are three critical timing requirements which are specific to registers
5.1.1 Pipelining
One place where registerrs become useful is in pipelining. Pipelining a circuit means placing registers after combinational logic blocks. This stops delay times from adding up because now intermediate quantities must be stored in the register before being passed to the next block. This allows for higher clock frequencies because we are no longer limited by the logic delays, only by our registers.
x (^) ¯x
Not Gate
x
y
x · y
And Gate
x
y
x + y
Or Gate
x
y
(x + y) · (x · y)
Xor Gate
x
y
(x · y)
Nand Gate
x
y
(x · y)
Nor Gate
These are the basic gates which are used in combinational logic. Using Boolean Algebra, we can simplify complicated logic statements/circuits.
x · x¯ = 0 x + ¯x = 1 x · 0 = 0 x + 1 = 1 x · 1 = x x + 0 = x x · x = x x + x = x x · y = y · x x + y = y + x (xy)z = x(yz) (x + y) + z = x + (y + z) x(y + z) = xy + xz x + yz = (x + y)(x + z) xy + x = x (x + y)x = x xy = ¯x + ¯y x + y = ¯x¯y
One other important circuit element is the data multiplexor (mux). A mux takes in n different sources of data and selects one of those sources to output. It does this using select inputs. For example, if a mux has 3 select bits, then it can choose from 8 different streams of data.
In circuits, timing is incredibly importat because combinational logic blocks have propagation delays (i.e the output does not change instantaneously with the input). When used in conjuction with registers (which what their own timing requirements), if one is not careful, we could build a circuit which produces inaccurate outputs. When checking whether or not a circuit will do what it is designed to do, we need to look at two special paths:
This will let us analyze things like the maximum clock rate we can achieve or the maximum hold time our registers can have.
tdelay = tsetup + tclk 2 q + tlongestCL
This first formula tells us the maximum amount of time it takes for an input to propagate from a register to another register. When the clock rises, the first register will read the input and output it tclk 2 q later. The input will then go through the combinational logic elements. Since we are looking for the maxi- mum, we only consider the longest CL path. Finally, the output needs to be stable for tsetup in order for the destination register to read it properly. As long as our clock period is longer than the max delay, our circuit will work properly.
thold = tclk 2 q + tshortestCL
This second formula tells us the maximum hold time which our registers can have. When the clock rises, the first register reads the input and outputs it tclk 2 q later. The input will then go through the combinational logic elements.
A single-cycle datapath is one where every instruction passes through the dat- apath one at a time. However, this is inefficient because faster stages (such as register reading) are left unused while waiting for slower stages (such as memory reading). One way to fix this is to pipeline the datapath. This allows multiple instructions to use different parts of the datapath at once, speeding up the pro- cessor because no stage is left unutilized. All we need to do is add registers after each datapath stage. However, this introduces ”hazards” into the processor.
6.1.1 Structural Hazards
Definition 17 A structural hazard is when two or more instructions compete for access to the same physical resource.
One solution is to have the instructions take turns to access the resource. The other option is to add more hardware to distribute that resource. For example, a Regfile structural hazard would be when the processor needs to read registers for one instruction and write a register for another. This is solved by giving the Regfile two independent read ports and one independent write port. Another example is a memory structural hazard where instructional memory and data memory are used simultaneously. This is solved by separating them into IMem and DMem.
6.1.2 Data Hazards
Definition 18 A data hazard is when two instructions have a data dependency between them
Problem: One instruction is reading from a register that a previous instruc- tion is writing back to Solution: The WB stage will always update the value first before Instruction Decode reads a new value
Problem: The result from the ALU will take 2 cycles to be written back. An instruction might need it before then Solution 1: We could stall by introducing no-ops (instructions that do noth- ing), but this would kill performance. Solution 2: WE can add a loop from the output of the ALU to the ALU input via a mux and add a control element to determine when we should use the forwarded value
Problem: Suppose a load instruction is followed by an instrucction that requires the data loaded from DMEM. The ALU thus needs the data as it is being read. This slot after a load is known as the load delay slot. If the load delay slot instruction uses the result of the load, then we need to stall for a cycle Solution 1: We could insert a no-op into the code
Solution 2: Turn off all write enables and run the second instruction twice Solution 3: Reorder instructions so that the load delay slot doesn’t use the loaded result
6.1.3 Control Hazards
Definition 19 A control hazard is when the program flow changes so instruc- tions in the pipeline are no longer relevant
Notice that this is only a problem when a branch is taken because it means the pipeline must be flushed to get the wrong instructions out of the pipeline (by converting them to no-ops). A more advanced way to fix this is to implement branch prediction which is where the processor attempts to predict whether or not a branch is taken and load instructions based on that.
7 The Memory Hierarchy
Computers have several levels of memory, each one serving different purposes. On the processor itself, we have the Regfile which provides fast read and write access. Right below it, we have the caches as welll as main memory (a.k.a DRAM) which has more capacity than registers but is much slower. After DRAM, we have Disk, which is even larger but incredibly slow. Different systems control the transfer of data between different parts of memory.
The closer memory is to the processor, the faster it is. However, that also makes it more expensive. The idea of caching is to copy a subset of main memory and keep it close to the processor. We can then stack multiple layers of caches (each one with more storage than the other) before we actually need to read DRAM.
Definition 20 Temporal locality is the idea that if data is used now, it will be used later on
Definition 21 Spatial locality is the idea that if data is used now, nearby data will likely be used later
Caches take advantage of both temporal and spatial locality to provide quick memory access.