




















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An insight into the risc (reduced instruction set computer) architecture and its memory organization. Risc focuses on reducing the number and complexity of instructions, allowing for faster execution and efficient use of memory. The role of the memory buffer register (mbr/mdr) in passing data to and retrieving data from memory, as well as the advantages of risc over cisc (complex instruction set computer) in terms of instruction decoding time and memory bandwidth. It also touches upon the importance of registers and cache memory in risc systems.
Typology: Study notes
1 / 28
This page cannot be seen from the preview
Don't miss anything!





















This chapter contains notes to accompany the second chapter of the textbook Structured Computer Organization (Fifth Edition) by Andrew S. Tanenbaum. Processors: CPU Organization A typical computer has four main components. These are:
CPU: Special–Purpose Registers The CPU has a number of special–purpose registers that are of interest at this time. The Program Counter (PC) contains the address of the instruction to be executed next. As a part of the fetch cycle (see the next section), the instruction at the address is fetched and the PC is incremented to point to the next instruction. Any branch involves updating the contents of the PC with the new target address. It is worth noting that the PC is better named by the nonstandard term “ IP ” or “Instruction Pointer” used in Intel documentation. As the textbook notes, the PC does not count anything; the origin of the terminology is quite obscure. The MAR (Memory Address Register) is used to address RAM (Random Access Memory). The MBR (Memory Buffer Register) , also called MDR (Memory Data Register) is used to pass data to memory and retrieve data from memory. It is worth note that the CPU design discussed in Chapter 4 of the textbook calls for a MBR and a separate MDR, but that such usage is non–standard. The IR (Instruction Register) is that register that holds the binary machine code for the instruction currently under execution. We shall discuss the interaction of these registers in the next section, in which we discuss the Fetch–Execute cycle. The structure of a typical CPU is shown in the next figure, taken from Tanenbaum’s textbook. This is also called the “data path” , referring to the fact that it shows the flow of data from one or two source registers into the ALU and then back to the destination. The symbol used for the ALU is a standard one, reflecting the fact that the ALU will have two inputs to accommodate many binary arithmetic operations (such as addition) and one output to produce the results. We shall later see that most modern CPU’s internally have a three bus structure. Often a CPU will have one of its general purpose registers, usually called “Register 0”, set to the constant value 0. This greatly facilitates the design of the control unit.
RISC vs. CISC Computers One of the recent developments in computer architecture is called by the acronym RISC. Under this classification, a design is either RISC or CISC, with the following definitions. RISC R educed I nstruction S et C omputer CISC C omplex I nstruction S et C omputer. The definition of CISC architecture is very simple – it is any design that does not implement RISC architecture. We now define RISC architecture and give some history of its evolution. The source for these notes is the book Computer Systems Design and Architecture, by Vincent P. Heuring and Harry F. Jordan. One should note that while the name “RISC” is of fairly recent origin (dating to the late 1970’s) the concept can be traced to the work of Seymour Cray, then of Control Data Corporation, on the CDC–6400 and related machines. Mr. Cray did not think in terms of a reduced instruction set, but in terms of a very fast computer with a well-defined purpose – to solve complex mathematical simulations. The resulting design supported only two basic data types (integers and real numbers) and had a very simple, but powerful, instruction set. Looking back at the design of this computer, we see that the CDC–6400 could have been called a RISC design. As we shall see just below, the entire RISC vs. CISC evolution is driven by the desire to obtain maximum performance from a computer at a reasonable price. Mr. Cray’s machines maximized performance by limiting the domain of the problems they would solve. The general characteristic of CISC architecture is the emphasis on doing more with each instruction. This may involve complex instructions and complex addressing modes; for example the MC68020 processor supports 25 addressing modes. The ability to do more with each instruction allows more operations to be compressed into the same program size, something very desirable if memory costs are high. Some historical data will illustrate the memory issue. Time Cost of memory Cost of disk drive Introduction of MC6800 $500 for 16KB RAM $55,000 for 40 MB Introduction of MC68000 $200 for 64 KB RAM $5,000 for 10 MB Micron (4/10/2002) $49 for 128 MB RAM $149 for 20 GB Dell (8/26/2006) $214 for 1 GB RAM $380 for 160 GB $650 for 750 GB Another justification for the CISC architectures was the “semantic gap”, the difference between the structure of the assembly language and the structure of the high level languages (COBOL, C++, Visual Basic, FORTRAN, etc.) that we want the computer to support. It was expected that a more complicated instruction set (more complicated assembly language) would more closely resemble the high level language to be supported and thus facilitate the creation of a compiler for the assembly language.
One of the first motivations for the RISC architecture came from a careful study of the implications of the semantic gap. Experimental studies conducted in 1971 by Donald Knuth and 1982 by David Patterson showed that nearly 85% of a programs statements were simple assignment, conditional, or procedure calls. None of these required a complicated instruction set. It was further notes that typical compilers translated complex high level language constructs into simpler assembly language statements, not the complicated assembly language instructions that seemed more likely to be used. The results of this study are quoted from an IEEE Tutorial on RISC architecture [R05]. This table shows the percentages of program statements that fall into five broad classifications. Language Pascal FORTRAN Pascal C SAL Workload Scientific Student System System System Assignment 74 67 45 38 42 Loop 4 3 5 3 4 Call 1 3 15 12 12 If 20 11 29 43 36 GOTO 2 9 -- 3 -- Other 7 6 1 6 The authors of this study made the following comments on the results. “There is quite good agreement in the results of this mixture of languages and applications. Assignment statements predominate, suggesting that the simple movement of data is of high importance. There is also a preponderance of conditional statements (If, Loop). These statements are implemented in machine language with some sort of compare and branch instruction. This suggests that the sequence control mechanism is important.” The “bottom line” for the above results can be summarized as follows.
The narrative from the tutorial continues with remarks on the RISC architectures developed at the University of California at Berkeley. “Although each project [the Berkeley RISC I and RISC II and the IBM 801] had different constraints and goals, the machines they eventually created have a great deal in common.
Instruction Pre-Fetching One advantage of the RISC architecture is seen in the process referred to as instruction pre-fetching. In this process, we view the fetch-execute process as a pipeline. In a traditional fetch-execute machine, the instruction is first fetched from memory and then executed. At least as early as the IBM Stretch (1959), it was recognized that the fetch unit should be doing something during the time interval for executing the instruction. The logical thing for the fetch unit to do was to fetch the instruction in the next memory location on the chance that it would be the instruction that would be executed next. This process has been shown to improve computer performance significantly. The logic to pre-fetch instructions is facilitated by the RISC design philosophy that all instructions are the same size, so in a machine based on 32-bit words the pre-fetch unit just grabs the next four bytes. Instruction pre–fetching appears rather simple, except in the presence of program jumps, such as occur in the case of conditional branches and the end of program loops. A lot of work has gone into prediction of the next instruction in such cases, where there are two instructions that could be executed next depending on some condition. It may be possible to execute both candidate instructions and discard the result of the instruction not in the true execution path. Implications for the Control Unit The complex instructions in a CISC computer tend to require more support in the execution than can conveniently be provided by a hardwired control unit. For this reason, most CISC computers are microprogrammed to handle the complexity of each of the instructions. For this reason, most CISC instructions require a number of system clock cycles to execute. The RISC approach emphasizes use of a simpler instruction set that can easily be supported by a hardwired control unit. As a side effect, most RISC instructions can be executed in one clock cycle. A given computer program will compile into more RISC instructions than CISC instructions, but the CISC instructions execute more slowly than the RISC instructions. The overall effect on the computer program may be hard to predict. According to the IEEE tutorial “Reducing the instruction set further reduces the work a RISC processor has to do. Since RISC has fewer types of instructions than CISC, a RISC instruction requires less processing logic to interpret than a CISC instruction. The effect of such simplification is to speed up the execution rate for RISC instructions. In a RISC implementation it is theoretically possible to execute an instruction each time the computer’s logic clock ticks. In practice the clock rate of a RISC processor is usually three times that of the instruction rate.”
Digital Equipment Corporation (DEC) undertook an experiment to investigate this correlation and produced a design yielding interesting, but not surprising, results, which are again quoted from the IEEE tutorial on RISC architecture. “DEC reported a subsetting experiment on two implementations of the VAX architecture in VLSI. The VLSI VAX has nine custom VLSI chips and implements the complete VAX–11 instruction set. DEC found that 20.0 percent of the instructions are responsible for 60.0 percent of the microcode and yet are only 0.2 percent of all instructions executed. By trapping to software to execute these instructions, the MicroVAX 32 was able to fit the subset architecture onto only one chip, with an optional floating–point processor on another chip. .. The VLSI VAX uses five to ten times the resources of the MicroVAX 32 to implement the full instruction set, yet is only 20 percent faster.” VLSI VAX MicroVAX 32 VLSI Chips 9 2 Microcode 480K 64K Transistors 1250K 101K The result is obvious – a simple and cheaper computer will do most of what you want. The rest can be better done in software. Design Principles for Modern Computers Experience with the RISC designs has lead to the acceptance of a number of design practices. These practices reflect the state of the art in a number of key areas. These are:
Instruction–Level Parallelism Everybody wants a faster computer. It is easy to describe a number of significant problems which would admit useful numerical solutions if much faster computers were available. Here is a small list of such problems.
It is often the case that the instruction execution unit is the slowest unit, forming the “bottleneck” in the pipeline. The superscalar architecture is designed to address this stage and leave the more efficient stages as single units. We use Tanenbaum’s figure to illustrate. Processor–Level Parallelism If one complete processor is good, why not use two or more complete processors. There are a number of arrangements for parallel processing, generally classified as multiprocessor designs and multicomputer designs. Multiprocessor designs include the dual–core designs found on high–end CPU chips, single–bus multiprocessors, and array computers. As noted above, a dual–core design places two complete CPUs on a single CPU chip; it being easier to run two CPUs at a given speed than one CPU at twice the speed. Multicomputer designs call for a number of computers complete with individual memory units that communicate by some “message passing” mechanism. This may involve anything from access to a large common memory unit to the use of TCP/IP to send Internet messages. The primary difficulty in any multiprocessor or multicomputer design is the sharing of data between the CPUs. There are many important problems that can be divided into subproblems with limited interaction. Consider weather modeling, in which the area of the continental U.S. can be divided into areas with interaction only at the boundaries. Although it would not actually be done this way, we could imagine a separate processor for each of the states Mississippi, Alabama, Florida, Georgia, South Carolina, North Carolina, Tennessee, Kentucky, and Virginia. The processor modeling the Georgia weather would communicate directly only with the processors for Alabama, Florida, North and South Carolina, and Tennessee as these are the states directly bordering Georgia. The effects of weather in Kentucky would be transmitted to the state of Georgia indirectly through the Tennessee processor as the Mississippi weather would be mediated by the Alabama processor. There are several interesting “screen saver” multicomputer applications worth mentioning. In one of these, an individual user connected to the Internet volunteers his computer as computational server, to be used during idle time in which there is no other use for it. The user downloads a program that is linked to the screen saver and then processes discrete data collections also downloaded from a central site. One of these is the SETI program.
Primary Memory By this we indicate memory that can be directly addressed by the CPU. This is often called “core memory”, due to the fact that it used to be implemented with a technology that used magnetic cores as binary storage devices. It is also called “store” or “storage”. The figure below shows a small core memory unit, probably from the late 1960’s or the 1970’s. Note the individual core units in the enlargement in the left picture. Beginning with the ENIAC in 1946 (and some earlier machines), computer memory has been organized around binary arithmetic, storing data as binary bits. While it is true that the ENIAC was a decimal machine, performing base–10 arithmetic, each digit was stored in BCD ( B inary C oded D ecimal) form. In BCD, each digit is coded as four binary bits, with the obvious encodings. Digit BCD Code 0 0000 1 0001 2 0010 3 0011 4 0100 5 0101 6 0110 7 0111 8 1000 9 1001 There are a number of reasons for the choice of binary encoding of all data in a computer.
Of course, there is no such thing as a pure Read-Only memory; at some time it must be possible to put data in the memory by writing to it, otherwise there will be no data in the memory to be read. The term “Read-Only” usually refers to the method for access by the CPU. All variants of ROM share the feature that their contents cannot be changed by normal CPU write operations. All variants of RAM (really Read-Write Memory) share the feature that their contents can be changed by normal CPU write operations. Some forms of ROM have their contents set at time of manufacture, other types called PROM (Programmable ROM), can have contents changed by special devices called PROM Programmers. Registers associated with the memory system All memory types, both RAM and ROM can be characterized by two registers and a number of control signals. Consider a memory of 2N^ words, each having M bits. Then the MAR ( Memory Address Register ) is an N-bit register used to specify the memory address the MBR ( Memory Buffer Register ) is an M-bit register used to hold data to be written to the memory or just read from the memory. This register is also called the MDR (Memory Data Register). We specify the control signals to the memory unit by recalling what we need the unit to do. First consider RAM (Read Write Memory). From the viewpoint of the CPU there are three tasks for the memory CPU reads data from the memory. Memory contents are not changed. CPU writes data to the memory. Memory contents are updated. CPU does not access the memory. Memory contents are not changed. We need two control signals to specify the three options for a RAM unit. One standard set is Select – the memory unit is selected. R / W – if 0 the CPU writes to memory, if 1 the CPU reads from memory. Select R^ /W Action 0 0 Memory contents are not changed. 0 1 Memory contents are not changed. 1 0 CPU writes data to the memory. 1 1 CPU reads data from the memory. We can use a truth table to specify the actions for a RAM. Note that when Select = 0, nothing is happening to the memory. It is not being accessed by the CPU and the contents do not change. When Select = 1, the memory is active and something happens. Consider now a ROM (Read Only Memory). Form the viewpoint of the CPU there are only two tasks for the memory: CPU reads data from the memory. CPU does not access the memory. We need only one control signal to specify these two options. The natural choice is the Select control signal as the R /Wsignal does not make sense if the memory cannot be written by the CPU. The truth table for the ROM should be obvious Select Action 0 CPU is not accessing the memory. 1 CPU reads data from the memory.
The Idea of Address Space We now must distinguish between the idea of address space and physical memory. The address space defines the range of addresses (indices into the memory array) that can be generated. The size of the physical memory is usually somewhat smaller, this may be by design (see the discussion of memory-mapped I/O below) or just by accident. An N-bit address will specify 2N^ different addresses. In this sense, the address can be viewed as an N-bit unsigned integer; the range of which is 0 to 2N^ – 1 inclusive. We can ask another question: given K addressable items, how many address bits are required. The answer is given by the equation 2 (N - 1)^ < K 2 N , which is best solved by guessing N. The memory address is specified by a binary number placed in the Memory Address Register (MAR). The number of bits in the MAR determines the range of addresses that can be generated. N address lines can be used to specify 2N^ distinct addresses, numbered 0 through 2 N^ – 1. This is called the address space of the computer. For example, we show three MAR sizes. Computer MAR bits Address Range PDP-11/20 16 0 to 65 535 Intel 8086 20 0 to 1 048 575 Intel Pentium 32 0 to 4 294 967 295 The PDP-11/20 was an elegant small machine made by the now defunct Digital Equipment Corporation. As soon as it was built, people realized that its address range was too small. In general, the address space is much larger than the physical memory available. For example, my personal computer has an address space of 2^32 (as do all Pentiums), but only 384MB = 2^28 + 2^27 bytes. Until recently the 32-bit address space would have been much larger than any possible amount of physical memory. At present one can go to a number of companies and order a computer with a fully populated address space; i.e., 4 GB of physical memory. Most high-end personal computers are shipped with 1GB of memory. In a design with memory–mapped I/O part of the address space is dedicated to addressing I/ O registers and not physical memory. For example, in the original PDP-11/20, the top 4096 (2^12 ) of the address space was dedicated to I/O registers, leading to the memory map. Addresses 0 – 61439 Available for physical memory Addresses 61440 – 61535 Available for I/O registers (61440 = 61536 – 4096) We shall return to memory–mapped I/O in a later discussion. For the moment, we need to recall that each I/O device is accessed via a set of registers, including status registers, data registers, and control registers. Each I/O device is assigned a sequence of addresses (either in a unified address space or a dedicated I/O address space) that are used to access these registers and allow the CPU to communicate with the devices.
Big-Endian and Little-Endian The reference here is to a story in Gulliver’s Travels written by Jonathan Swift in which two groups of men went to war over which end of a boiled egg should be broken – the big end or the little end. The student should be aware that Swift did not write pretty stories for children but focused on biting satire; his work A Modest Proposal is an excellent example. Consider the 32-bit number represented by the eight-digit hexadecimal number 0x01020304, stored at location Z in memory. In all byte-addressable memory locations, this number will be stored in the four consecutive addresses Z, (Z + 1), (Z + 2), and (Z + 3). The difference between big-endian and little-endian addresses is where each of the four bytes is stored. In our example 0x01 represents bits 31 – 24, 0x02 represents bits 23 – 16, 0x03 represents bits 15 – 8, and 0x04 represents bits 7 – 0 of the word. As a 32-bit signed integer, the number 0x01020304 can be represented in decimal notation as 1 166 + 0 165 + 2 164 + 0 163 + 3 162 + 0 161 + 4 160 = 16,777,216 + 131,072 + 768 + 4 = 16,909,060. For those who like to think in bytes, this is (01) 166 + (02) 164 + (03) 162 + 04, arriving at the same result. Note that the number can be viewed as having a “big end” and a “little end”, as in the following figure. The “big end” contains the most significant digits of the number and the “little end” contains the least significant digits of the number. We now consider how these bytes are stored in a byte-addressable memory. Recall that each byte, comprising two hexadecimal digits, has a unique address in a byte-addressable memory, and that a 32-bit (four-byte) entry at address Z occupies the bytes at addresses Z, (Z + 1), (Z + 2), and (Z + 3). The hexadecimal values stored in these four byte addresses are shown below. Address Big-Endian Little-Endian Z 01 04 Z + 1 02 03 Z + 2 03 02 Z + 3 04 01 Just to be complete, consider the 16-bit number represented by the four hex digits 0A0B. Suppose that the 16-bit word is at location W; i.e., its bytes are at locations W and (W + 1). The most significant byte is 0x0A and the least significant byte is 0x0B. The values in the two addresses are shown below. Address Big-Endian Little-Endian W 0A 0B W + 1 0B 0A
The figure below shows a graphical way to view these two options for ordering the bytes copied from a register into memory. We suppose a 32-bit register with bits numbered from 31 through 0. Which end is placed first in the memory – at address Z? For big-endian, the “big end” or most significant byte is first written. For little-endian, the “little end” or least significant byte is written first. There seems to be no advantage of one system over the other. Big-endian seems more natural to most people and facilitates reading hex dumps (listings of a sequence of memory locations), although a good debugger will remove that burden from all but the unlucky. Big-endian computers include the IBM 360 series, Motorola 68xxx, and SPARC by Sun. Little-endian computers include the Intel Pentium and related computers. The big-endian vs. little-endian debate is one that does not concern most of us directly. Let the computer handle its bytes in any order desired as long as it produces good results. The only direct impact on most of us will come when trying to port data from one computer to a computer of another type. Transfer over computer networks is facilitated by the fact that the network interfaces for computers will translate to and from the network standard, which is big-endian. The major difficulty will come when trying to read different file types. The big-endian vs. little-endian debate shows in file structures when computer data are “serialized” – that is written out a byte at a time. This causes different byte orders for the same data in the same way as the ordering stored in memory. The orientation of the file structure often depends on the machine upon which the software was first developed. The following is a partial list of file types taken from a textbook once used by this author. Little-endian Windows BMP, MS Paintbrush, MS RTF, GIF Big-endian Adobe photoshop, JPEG, MacPaint Some applications support both orientations, with a flag in the header record indicating which is the ordering used in writing the file. Any student who is interested in the literary antecedents of the terms “big-endian” and “little- endian” may find a quotation at the end of this chapter.