






























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
IFT 212 | computer architecture and design | Dr. Fortune
Typology: Study notes
1 / 38
This page cannot be seen from the preview
Don't miss anything!































Computer Architecture and Organization
Topics Covered:
TOPIC 1: MEMORY SYSTEM
The memory system is one of the most fundamental components of any digital computer. It is responsible for storing data, instructions, and intermediate results that the processor needs during execution. Without a well-designed memory system, even the most powerful processor would be unable to function effectively. The memory system is the foundation of the stored-program concept — the revolutionary idea, proposed by John von Neumann in 1945, that both program instructions and data can be stored in the same memory and processed by the same hardware. Modern computer memory is not a single, uniform storage medium. Instead, it is a carefully engineered hierarchy of different storage technologies, each occupying a distinct position in terms of speed, capacity, cost per bit, and volatility. The engineering challenge is to give the processor the illusion of a single, large, fast, and cheap memory — which cannot exist in practice. The memory hierarchy solves this by placing small amounts of very fast memory near the CPU and progressively larger, slower, and cheaper memory further away. Understanding the memory system is essential for every computer science and engineering student because virtually every aspect of computer performance — from program execution speed to data throughput — is influenced by memory behavior. Cache misses, page faults, and memory bandwidth bottlenecks are among the most common performance limiters in real systems.
The memory hierarchy exploits two important empirical observations about program behavior, collectively known as the Principle of Locality:
Figure: Memory Hierarchy — from fastest/smallest (Registers) to slowest/largest (Tertiary Storage) The levels of the hierarchy, from top to bottom, are:
Cache memory is the critical bridge between the fast CPU and the relatively slow main memory. Every modern processor has multiple levels of cache. Understanding how cache works is essential for writing high-performance software. Figure: Cache Hit/Miss Operation — showing decision flow and data path
When the CPU needs to read data from memory, it first checks the cache. If the data is present (a cache hit), it is returned immediately at cache speed. If not (a cache miss), the CPU must wait for the data to be fetched from the next level of the hierarchy (L2, L3, or main memory), and a copy of the data is placed in the cache for future access. Hit Ratio (h): Fraction of all memory references that result in a cache hit. Typical values: 0.90–0.99. Average Access Time: T_avg = h × T_cache + (1 - h) × T_main. For h=0.95, T_cache=2 ns, T_main=100 ns: T_avg = 0.95×2 + 0.05×100 = 1.9 + 5 = 6.9 ns.
Memory interleaving improves effective memory bandwidth by dividing main memory into multiple independent banks and distributing consecutive addresses across these banks. When the CPU accesses sequential addresses (as in array traversal), different banks service the requests simultaneously, pipelining the memory accesses. In a k-bank interleaved system, address A goes to bank (A mod k). The controller initiates a new bank access every (cycle_time / k) time units, potentially multiplying memory bandwidth by k. Interleaving is particularly effective for streaming workloads and DMA transfers.
Q1. What is the memory hierarchy? Why is it necessary, and what principle justifies its effectiveness? Answer: The memory hierarchy is a structured arrangement of different memory technologies, organized from fastest/smallest/most-expensive (registers) to slowest/largest/cheapest (tape). It is necessary because no single memory technology can simultaneously provide high speed, large capacity, and low cost — these properties are fundamentally in tension. The hierarchy works because of the Principle of Locality: programs exhibit temporal locality (recently accessed data is likely to be accessed again soon) and spatial locality (data near recently accessed locations is likely to be accessed next). By keeping recently accessed data in fast cache memory, the system achieves near-cache performance for most accesses while the bulk of data resides in cheap, slow storage.
TOPIC 2: MEMORY ADDRESSING
Memory addressing is the mechanism by which a processor identifies and accesses specific locations in memory. Every byte in a computer's addressable memory space has a unique numerical address. Instructions must specify both the operation to be performed and the operand(s) on which to operate. The addressing mode specifies how the operand address is to be computed from the information in the instruction. The design of addressing modes is a critical aspect of Instruction Set Architecture (ISA) design. Rich addressing modes can reduce the number of instructions needed to express a computation, but they also increase hardware complexity and may require more bits in each instruction. RISC architectures tend to use simple addressing modes (immediate, register, register + offset) while CISC architectures (like x86) support many complex modes. The address bus width determines the maximum addressable memory space. A 32-bit address bus gives 2^32 = 4 GB of addressable memory, while a 64-bit address bus gives 2^64 = 16 EB (exabytes), though in practice modern processors support 48–57 physical address bits.
Figure: Overview of CPU Addressing Modes — showing syntax, mechanism, and memory access count
The operand value is embedded directly within the instruction itself. No memory access is needed to retrieve the operand — the value is extracted from the instruction during decoding. The operand field in the instruction word contains the actual data value, not a reference to it.
Mechanism: Operand = Value field in the instruction | No memory access needed | Fastest mode
The instruction contains the full memory address of the operand. The processor uses this address as-is to access main memory. The effective address (EA) equals the address field in the instruction. Mechanism: EA = Address field in instruction | 1 memory access | Simple but address field must be wide
The instruction contains the address of a memory location that holds the effective address of the operand. This introduces one level of indirection — a 'pointer to the operand.' Two memory accesses are required: first to fetch the effective address, then to fetch the actual operand. Mechanism: EA = Memory[Address field] | 2 memory accesses | Supports pointer operations
The operand is in a CPU register. The instruction specifies a register number (typically 3– bits). Register access is the fastest possible operand retrieval — registers are inside the CPU with no memory bus latency. Mechanism: Operand = Register[reg_field] | 0 memory accesses | Fastest for in-CPU data
A logical (virtual) address is generated by the CPU during program execution. It refers to the program's virtual address space — a contiguous range of addresses that the program believes it has to itself. A physical address is the actual location in the RAM hardware. The Memory Management Unit (MMU) is a hardware component that translates logical addresses to physical addresses transparently, every time memory is accessed. This translation is described by a page table (for paging systems) or a segment table (for segmentation systems). Figure: Paging-based virtual to physical address translation showing the MMU and Page Table
Paging divides virtual memory into fixed-size pages (typically 4 KB) and physical memory into equally-sized frames. The page table maps virtual page numbers (VPN) to physical frame numbers (PFN). The virtual address is split into [VPN | Page Offset]. The VPN is looked up in the page table to get the PFN; the PFN is concatenated with the unchanged page offset to form the physical address.
To speed up page table lookups, modern CPUs include a Translation Lookaside Buffer (TLB) — a small fully-associative cache of recently used VPN→PFN translations. TLB hit rates of 99%+ are typical for most workloads.
Segmentation divides the virtual address space into variable-length segments corresponding to logical program units (code segment, data segment, stack segment, heap segment). Each segment has a base address and a length limit. The virtual address is [segment number | offset]. The segment table maps segment numbers to base addresses; the hardware checks that the offset is within the segment's limit. Many systems (including x86-64) combine both: a small number of segments (with base= in 64-bit mode, effectively disabling traditional segmentation) and a full paging system. This is called paged segmentation or segmented paging.
Q1. What is an addressing mode? Why do processors support multiple addressing modes? Answer: An addressing mode is the method by which an instruction specifies the location of its operand. Different addressing modes offer different trade-offs between instruction size, flexibility, and speed. Processors support multiple modes because different programming constructs require different access patterns: loading a constant uses immediate mode; traversing an array uses indexed mode; dereferencing a pointer uses register indirect mode; implementing branches uses relative mode. Supporting multiple modes allows the instruction set to express a wide range of operations efficiently, reducing both code size and execution time compared to a single rigid addressing scheme. Q2. Distinguish between direct addressing and indirect addressing. Give an example of each. Answer: In direct addressing, the instruction contains the actual memory address of the operand. Example: LOAD R1, 5000 fetches data from address 5000 — one memory access. In indirect addressing, the instruction contains the address of a memory location that holds the effective address of the operand. Example: LOAD R1, (5000) first reads address 5000 to get (say) 8000, then reads 8000 to get the data — two memory accesses. Indirect addressing is slower but enables pointer-based programming — the address can be changed dynamically at runtime by updating the pointer at address 5000. Q3. Explain virtual memory. How does the MMU translate a virtual address to a physical address in a paging system? Answer: Virtual memory is a memory management abstraction that gives each process the illusion of its own large, contiguous address space, regardless of physical RAM availability. In a paging system, the MMU splits the virtual address into a Virtual Page Number (VPN) and a Page Offset. The VPN is used as an index into the process's page table, which stores the mapping from VPN to Physical Frame Number (PFN). The physical address is constructed as [PFN | Page Offset]. To avoid accessing the page table in memory on every reference (which would double all memory accesses), the TLB caches recent VPN→PFN translations. On a TLB hit, the translation completes in one cycle; on a TLB miss, the page table is consulted. Q4. A computer has a 20-bit virtual address space, a page size of 1 KB, and uses 4 bytes per page table entry. Calculate the size of the page table.
TOPIC 3: HARDWARE CONTROL AND MICRO PROGRAM CONTROL
The Control Unit (CU) is the 'conductor' of the CPU orchestra. It does not process data itself; instead, it interprets decoded machine instructions and generates precisely timed electrical signals that direct every other component in the processor — the ALU, registers, memory interface, and I/O controllers — to perform the correct sequence of operations. Every instruction in a program must pass through the control unit for interpretation. The process follows the instruction cycle (also called the fetch-decode-execute cycle or FDX cycle), which is the fundamental rhythm of all stored-program computers:
In a hardwired control unit, the control logic is implemented entirely using combinational and sequential digital circuits — logic gates, flip-flops, decoders, and multiplexers. For each instruction opcode and each step in the instruction cycle, the combinational logic directly asserts the appropriate control signal outputs.
A hardwired control unit consists of the following components:
Figure: Side-by-side comparison of Hardwired (left) and Microprogrammed (right) control unit structures
Microprogrammed control, introduced by Maurice Wilkes at Cambridge University in 1951, is one of the most elegant ideas in computer architecture. Instead of implementing control logic in fixed hardware, the control behavior is stored as a program — a microprogram — in a fast internal memory called the control store. The key insight is that the control unit's job is to generate a specific pattern of binary signals at each step of each instruction's execution. This pattern can be represented as a binary word (a microinstruction) stored in memory. Reading microinstructions from memory and using their bits to drive control lines is equivalent to hardwired logic — but far more flexible.
14.This address is loaded into the CAR (MPC). 15.The control store is read at address CAR → the microinstruction is placed in CDR. 16.The bit fields of the CDR are decoded to generate control signals for the datapath. 17.The sequencer logic computes the next CAR value (sequential, branch, or next instruction start). 18.Repeat from step 4 until the microprogram for the current instruction completes.
In a horizontal microinstruction format, each bit (or small group of bits) directly controls one specific control line in the CPU datapath. A wide microinstruction word (e.g., 100–200 bits) can specify many simultaneous micro-operations. The key advantage is maximum parallelism — all non-conflicting micro-operations in one instruction cycle can be expressed in a single microinstruction. Example 100-bit microinstruction: Bit 0: ALU_ADD | Bit 1: ALU_SUB | Bits 5-8: Source Reg | Bits 9-12: Dest Reg | Bit 20: MEM_READ | Bit 21: MEM_WRITE | ... Disadvantage: Most bits in most microinstructions are zero (unused), making the control store very wide and somewhat wasteful.
In a vertical microinstruction format, control signals are encoded into compact operation codes within the microinstruction. A decoder expands these encoded fields into actual control lines. For example, instead of one bit per possible ALU operation, a 4-bit ALU opcode field encodes one of 16 ALU operations. This results in narrower microinstructions and smaller control stores, but requires additional decoder hardware and limits the number of micro-operations that can be specified simultaneously (reducing parallelism).
Q1. Describe the fetch-decode-execute cycle in detail. What role does the control unit play at each stage? Answer: Fetch: The CU asserts a memory read control signal with the PC value as the address, reads the instruction into IR, and increments PC. Decode: The CU's instruction decoder interprets the opcode in IR to determine the instruction type, addressing mode, and required resources. Execute: The CU generates a precisely timed sequence of control signals — e.g., for a LOAD instruction: it sends the operand address to MAR, asserts MEM_READ, waits for data to appear in MDR, then asserts REG_WRITE to transfer MDR contents to the destination register. Write-back: For instructions that compute a result (e.g., ADD), the CU routes the ALU output to the destination register via a write control signal. The CU coordinates all these activities without processing any data itself. Q2. What is a microinstruction? Give a concrete example of what its bit fields might control. Answer: A microinstruction is a single binary word stored in the control store that specifies the control signals to be asserted during one clock cycle of instruction execution. For example, a 32-bit microinstruction might be structured as: bits [3:0] = ALU operation code (0000=ADD, 0001=SUB, 0010=AND, etc.); bits [7:4] = source register A; bits [11:8] = source register B; bits [15:12] = destination register; bit [16] = MEM_READ; bit [17] = MEM_WRITE; bit [18] = REG_WRITE; bits [28:19] = next microinstruction address; bits [30:29] = branch condition; bit [31] = end of microprogram. Each clock cycle, one microinstruction is fetched and its fields decoded to drive the datapath. Q3. Compare hardwired and microprogrammed control. Under what circumstances would you choose each? Answer: Hardwired control: Implements control logic as fixed combinational/sequential circuits. Extremely fast (signals generated in 1–3 gate delays), no memory access overhead. Inflexible — bugs require hardware revision; complex ISAs lead to unmanageable circuit complexity. Best for RISC architectures with small, regular instruction sets where maximum clock speed is the priority. Microprogrammed control: Stores control logic as microprograms in a control store ROM. Slower (one memory access per micro-step), but extremely flexible — instruction behavior can be modified by updating firmware. Best for CISC architectures with complex, irregular instruction sets, or any system where field-upgradable control logic is valuable. Modern processors use a hybrid: fast hardwired paths for common instructions, microcode fallback for rare complex ones. Q4. Explain the difference between horizontal and vertical microprogramming with respect to parallelism, control store width, and decoding overhead. Answer: Horizontal microprogramming: Each bit in the microinstruction directly controls one control line. If the CPU has N control signals, each microinstruction is N bits wide. Multiple control signals can be asserted simultaneously in one microinstruction, maximizing datapath parallelism. The control store is very wide (potentially 100–200 bits per word) but no decoder is needed between the CDR and the control lines. Vertical microprogramming: Control signals are encoded into compact opcode fields. A microinstruction might use 4 bits to specify one of 16 ALU operations rather than 16 separate bits. Microinstructions are narrow (20–32 bits), the control store is smaller, but a decoder is needed to expand the encoded fields, adding a small delay. Parallelism is limited — one encoded field can specify only one operation per field per cycle. A compromise, nano-programming, uses a two-level scheme: short microinstructions address a nanoinstruction table that holds wide horizontal words.