














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Computer Organization; Subject: Computer Science; University: Columbus State University; Term: Unknown 1989;
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















The Instruction Set Architecture (ISA) The instruction set architecture (ISA) of a computer is the structure of the computer as seen by an assembly language programmer. In this chapter, we look at the computer hardware as seen at the assembly language level, discuss addressing modes, and briefly discuss assembly language. We shall then present and discuss a very simple assembly language. We specify that the computer to be studied is a stored program computer , as are all modern computers. Such a computer executes a program that has been previously stored in the computer’s memory system, perhaps having been copied in from the disk. Only very early computers, such as the ENIAC (1945), are not classified as stored program computers. The program for the ENIAC was specified by a set of switches on one of its panel; in this design the memory and registers stored only data. Some other early machines executed programs directly read from punch cards and not stored in memory. At this point we insist that if the machine is not a stored program computer, it is ancient history and not to be studied. At the top level, most computers have three major components: the CPU (Central Processing Unit), Memory, and the I/O (Input/Output) system. The CPU contains a number of major subcomponents: the ALU (Arithmetic Logic Unit) that does the arithmetic, the Control Unit that controls the execution of the assembly language, and the register set that provides fast temporary storage for the CPU. A closer look at the register set shows that some of the registers are visible to the assembly language programmer and some of the registers are for use by the control unit only and are not accessible to the programmer. Common examples of the latter set are the MAR (Memory Address Register) and MBR (Memory Buffer Register) , which contain the address and contents of the specified memory location. Before discussing memory design and the use of these two registers, we consider the historical development of the computer as driven by the economic issues associated with the manufacture of computers, specifically the cost of both primary memory and disk memory. RISC vs. CISC Computers The first issue in the definition of the ISA of a computer is whether it is to be a RISC machine or a CISC machine. The definitions are as follows: RISC R educed I nstruction S et C omputer CISC C omplex I nstruction S et C omputer. The definition of CISC architecture is very simple – it is any design that is not a RISC architecture. We now define RISC architecture and give some history of its evolution. It should be noted that, with the exception of some early research models (such as the Berkeley RISC I and RISC II and the IBM 801), there are no “pure RISC” computers. The RISC approach is a design philosophy that has only recently become attractive. It is to be expected that most future CPU designs will incorporate a number of RISC features, but retain many of the CISC components that have made these designs popular. Page 1 of 22 Chapter 5 Revised July 7, 2005
One should note that while the name “RISC” is of fairly recent origin (dating to the late 1970’s) the concept can be traced to the work of Seymour Cray, then of Control Data Corporation, on the CDC-6400 and related machines. Mr. Cray did not think in terms of a reduced instruction set, but in terms of a very fast computer with a well-defined purpose – to solve complex mathematical simulations. The resulting design supported only two basic data types (integers and real numbers) and had a very simple, but powerful, instruction set. Looking back at the design of this computer, we see that the CDC-6400 could have been called a RISC design. As we shall see just below, the entire RISC vs. CISC evolution is driven by the desire to obtain maximum performance from a computer at a reasonable price. Mr. Cray’s machines maximized performance by limiting the domain of the problems they would solve. The general characteristic of a CISC architecture is the emphasis on doing more with each instruction. This may involve complex instructions and complex addressing modes; for example the MC68020 processor supports 25 addressing modes. For example, we give the two Pentium code listings as featured in chapter 4 of our textbook. The first example uses simple constructs, the second the more complex constructs that yield more compact code. Note that the first code fragment uses 9 statements, while the second uses only 5. LEA EBX, Num MOV ECX, Num MOV EAX, 0 MOV EDI, 0 Start: ADD EAX, [EBX + EDI + 4] INC EDI DEC ECX JG Start MOV Sum, EAX MOV ECX, Num Start: ADD EAX, [EBX + EDI + 4] INC EDI LOOP Start MOV Sum, EAX The ability to do more with each instruction allows more operations to be compressed into the same program size, something very desirable if memory costs are high. Some historical data will illustrate the memory issue. Time Cost of memory Cost of disk drive Introduction of MC6800 $500 for 16KB RAM $55,000 for 40 MB Introduction of MC68000 $200 for 64 KB RAM $5,000 for 10 MB Micron (4/10/2002) $49 for 128 MB RAM $149 for 20 GB Gateway (7/7/2005) $130 for 512 MB RAM $120 for 180 GB with 8MB cache. Page 2 of 22 Chapter 5 Revised July 7, 2005
Thus we are facing the facts that the more complex instruction sets are not necessary and that dropping the ability to support them will yield a faster CPU. There are other factors that favor the RISC architecture, specifically the fact that speed-up techniques such as instruction pre-fetching and instruction pipelining are more easily achieved for simple instructions. The philosophy behind the RISC approach is well described in an IEEE tutorial. Here we pick up on a narrative by a design engineer who worked on the IBM 801 project. “About this point, several people, including those who had been working on microprogramming tools, began to rethink the architectural design principles of the 1970’s. In trying to close the ‘semantic gap’, these principles had actually introduced a ’performance gap’. The attempt to bridge the gap with WCS’s [Writable Control Stores – microprogrammed control units] was unsuccessful.” “A new computer design philosophy evolved: Optimizing compilers could be used to compile ‘normal’ programming languages down to instructions that were as unencumbered as microinstructions in a large virtual address space, and to make the instruction cycle time as fast as the technology would allow. These machines would have fewer instructions – a reduced set – and the remaining instructions would be simple and would generally execute in one cycle – reduced instructions – hence the name reduced instruction set computers (RISC’s). RISC’s inaugurated a new set of architectural design principles.
There are a number of other advantages to the RISC architecture. We list a few Better Access to Memory Better Support of Compilers According to an IEEE Tutorial “Register-oriented architectures have significantly lower data memory bandwidth. Lower data memory bandwidth is highly desirable since data access is less predictable than instruction access and can cause more performance problems.” We note that, even at 6.4 GB/second data transfer rates, access to memory is still a bottleneck in modern computer design, so any design that reduces the requirement for memory access (here called reducing the memory bandwidth) would be advantageous. Better Support of Compilers According to an IEEE Tutorial “The load/store nature of these [existing RISC] architectures is very suitable for effective register allocation by the compiler; furthermore, each eliminated memory reference results in saving an entire instruction.” Compilers attempt to increase the efficiency of the run–time code by storing variable contents in CPU registers for later access, thus avoiding another memory reference when the variable is used again. This part of the semantic analysis done in the compilation stage is called “ register allocation ”. The RISC designs tend to facilitate this stage in compilation by having more registers to allocate. More efficient register allocation by a compiler will usually result in faster-running code. We see this as another advantage of the RISC design. Implications for the Control Unit The complex instructions in a CISC computer tend to require more support in the execution than can conveniently be provided by a hardwired control unit. For this reason, most CISC computers are microprogrammed to handle the complexity of each of the instructions. For this reason, most CISC instructions require a number of system clock cycles to execute. The RISC approach emphasizes use of a simpler instruction set that can easily be supported by a hardwired control unit. As a side effect, most RISC instructions can be executed in one clock cycle. A given computer program will compile into more RISC instructions than CISC instructions, but the CISC instructions execute more slowly than the RISC instructions. The overall effect on the computer program may be hard to predict. According to an IEEE tutorial “Reducing the instruction set further reduces the work a RISC processor has to do. Since RISC has fewer types of instructions than CISC, a RISC instruction requires less processing logic to interpret than a CISC instruction. The effect of such simplification is to speed up the execution rate for RISC instructions. In a RISC implementation it is theoretically possible to execute an instruction each time the computer’s logic clock ticks. In practice the clock rate of a RISC processor is usually three times that of the instruction rate.” Page 5 of 22 Chapter 5 Revised July 7, 2005
Registers Registers are a form of computer memory. A register set (or register file ) is a set of storage locations built into the Central Processing Unit. Logically a register is the same as a word in the computer’s main memory. Physically, the register is implemented using a faster and more costly technology. The fact that the register set is almost always on the same chip as the CPU considerably speeds the time to access the registers. Registers are either general purpose or special purpose. A general purpose register can be used for a number of tasks, including data storage and address storage. A special purpose register has a limited use; the MAR is a special purpose register used only to store memory addresses. When we discuss the Instruction Set Architecture of a computer, we normally focus on only those registers accessible to the assembly language; these tend to be general purpose registers. Due to cost restrictions, some early computers had only one general purpose register, called an accumulator because it was used to accumulate the results of computation. This register was first seen on adding machines and now is seen, not by that name, on pocket calculators. When we enter a series of numbers and mathematical operations, the display of the calculator continually updates the result. The accumulator in a single register CPU functions similarly. Some Special Purpose Registers We have already discussed two special purpose registers that are not part of the ISA, the MAR Memory Address Register, and the MBR Memory Buffer Register. A stored program computer is based on the fetch / execute cycle in which an instruction from the stored program is fetched from the memory and then executed. This structure requires two additional registers, neither of which is normally accessible to the programmer. PC the Program Counter containing the address of the instruction to execute. In some architectures, such as the PDP-11 and VAX, the PC is accessible. On the Intel 80x86, this is called the IP (Instruction Pointer). IR the Instruction Register containing the instruction that has been fetched. We can characterize the fetch cycle in code that looks like C++, except that it is upper case. IR = M[PC++] // the memory word at the address is read into the IR // and the PC is incremented after the read. In the course on computer architecture we see that the PC is incremented after the instruction is read from memory because of two facts
As an abstract data type (ADT), the stack is defined as a LIFO (Last In, First Out) data structure with the operations PUSH and POP. For this course we consider the stack as an array of memory, with the push and pop operations defined as follows. PUSH M[SP++] = Input POP Output = M[--SP] A Side Remark on Secure Software The following C++ function illustrates the common way to implement a stack of integers. int pop () { int x ; x = m[--sp] ; return x ; } This common practice leaves the value on the stack, under the assumption that it will be overwritten at some point in the future execution of the program. With all of the present concern about stack smashing and other techniques used by hackers, we may prefer to clear the stack entry before returning the value. This may sound paranoid, but here is the code. int pop () // Array m is declared globally { // Ignore index checking int x ; x = m[--sp] ; // Get the value m[sp] = 0 ; // Then clear the entry return x ; } Characterization of the CPU by Number of Registers One way to characterize the ISA of a computer is the number of general purpose registers available in the CPU. The broad categories are “one”, “few”, and “many”. Older computers, such as one design to be discussed below, use only one register, often called the accumulator , because of the expense of registers in the early design. Later variants of the early computers used two registers, called ACC and MQ to support hardware implementation of multiplication and division. To see why this should be so, consider the PDP-9 (a computer already obsolete by 1975) which had 18-bit words. From earlier discussions of computer arithmetic, we can infer that a single 18-bit accumulator would suffice to support both addition and subtraction of 18-bit integers. Consider the product of two 18-bit integers. It is easily seen that the product could be 36 bits in length. To convince oneself of this, consider the product of two integers of two digits each: 9999 = 9801, a four digit number. Similarly, the largest unsigned 18-bit integer is 2^18 – 1 and (2^18 – 1)(2^18 – 1) = 2^36 – 2 218 + 1 = 2^36 – 2^19 + 1, which takes 36 bits to represent. Page 8 of 22 Chapter 5 Revised July 7, 2005
We now consider the implementation of the expression A = B + C in the assembly language of a load-store RISC machine. Recall that the load-store design restricts access to memory to only two instructions – register load and register store. Arithmetic operations, such as the ADD discusses above, can only be done between registers. There are a number of reasons for this decision, alluded to above but repeated here.
Another Example: The Number of Operands We now consider another aspect of the design of an instruction set: how many arguments should be used. Consider the algebraic expression Z = X + Y. This is a common example of addition – two numbers are added together. There are three arguments implied – the two numbers to be added and the variable to hold the sum. In the examples below, we consider a variety of register options. The following assembly language notations will be used. AC In a single accumulator machine (such as the MARIE), the token AC will refer to the single accumulator. Other letters are used for variable names. %R1 In a machine with a number of registers, we prefix a register name with the “%” character to make the name distinct from a variable name. The question in design of the assembly language is how to implement operations such as the simple addition shown above. There are four possibilities, each of which is shown below.
1. Opcode + 3 arguments In this situation, the expression Z = X + Y will translate into something such as ADD Z, X, Y or ADD X, Y, Z – depending on what the argument order implies. In some computers, any of the arguments could be registers. In a RISC machine with a load-store architecture , all arguments must be registers.
Zero-Argument Implementation Again, we must remind ourselves that not all operations on a stack-based architecture lack operands. There must be operations to place arguments onto the stack and to remove them. As above, we postulate two basic one-argument operations Push X // Place the value of the location X onto the top of the stack // and increment the stack pointer. Pop X // Decrement the stack pointer and copy the value from the top // of the stack into location X. Binary operations, such as addition and multiplication, remove the top two stack values to be used as operands and place the result on the top of the stack. Unary operations, such as negation, remove only one value from the stack and then replace it with the result. PUSH X PUSH Y MULT PUSH W PUSH U MULT ADD POP Z At the end of this sequence of operations, the stack is back to its original state and the computed result (XY + WU) is stored in variable Z. Page 13 of 22 Chapter 5 Revised July 7, 2005
Diversion: The Intel 8088 Register Set The main reason for this section is the fact that most of us are now working on WINTEL machines (running W indows on top of an Intel processor). The registers in the Intel 8088 are divided into four sets: the Common Registers, the Segment Registers (mentioned earlier), the Instruction Pointer, and the Flag Register. All of these registers are 16-bit registers, though some can be treated as two 8-bit registers. The Common Registers are AX the Accumulator BX the Base Register CX the Count Register DX the Data Register DI the Destination Index SI the Source Index SP the Stack Pointer BP the Base Pointer The Segment Registers are DS the Data Segment CS the Code Segment SS the Stack Segment ES the Extra Segment Each of the AX, BX, CX, and DX registers has two associated 8-bit registers – a “high” register containing the upper 8 bits of the register and a “low” register with the low 8 bits. AX the registers are AH and AL BX the registers are BH and BL CX the registers are CH and CL DX the registers are DH and DL. The register sets are tied together. Changing the value of the AX register might change the value of both the AH and AL registers. Changing the value of either the AH or AL registers will change the value in the AX register. The formula is AX = AH 256 + AL. The Intel 80386 and more recent have expanded the general purpose registers to 32 bits; thus we have the EAX, EBX, ECX, and EDX registers. We use the AX register as an example of the relationship between the 32-bit, 16-bit, and 8-bit registers. Bits 31 - 16 Bits 15 - 8 Bits 7 - 0 EAX AX AH AL Page 14 of 22 Chapter 5 Revised July 7, 2005
Register Addressing In register addressing, a register is used for an operand. An example of register addressing would be the instruction LOAD %R2, %R1 in which the contents of register 2 are copied into register 1. Both arguments are registers in the CPU Direct Addressing This method of addressing specifies the address of the operand directly in the instruction. Suppose that variable X refers to memory location 0x660. The opcode for LOAD is 0001, so LOAD X assembles to the binary machine code 0001 0110 0110 0000, or 0x and we see that the last three hex digits contain the address X. In the MARIE architecture, this causes the contents of memory address X to be copied to the accumulator. We should note there that any instruction with more than one operand can use a mixed mode addressing. Consider the following instruction, not a part of the MARIE ISA. LOAD %R1, X // Register 1 gets the contents of location X This instruction uses two addressing modes: direct addressing for the source operand and register addressing for the destination operand. Such coding is quite common. Register Indirect Addressing In this addressing mode, the register does not hold the operand, but rather the address of the operand, which is located in memory. Suppose that the contents of register %R1 are 0x and consider the clumsily written instruction LOAD %R2, %R1, where the “” operator indicates indirection. The address 0x0660 would be found in register %R1 and the contents of memory location 0x0660 would be copied into register %R2. Indirect Addressing In this form of addressing, the memory location directly referenced in the instruction contains a pointer to the operand. Consider the instruction LOAD @X, where X is address 0x0660. The contents of this address contain the address of the operand. Suppose M[X] = 0x0ACE. Then the effective address of the operand is 0x0ACE and the contents of address 0x0ACE are copied into the accumulator. Indexed Addressing This form is somewhat similar to direct addressing except that a register is used to specify an offset to the direct address. This offset is similar to an index in an array. Thus we might have the instruction LOAD X, %R1 used to indicate that the operand at address X + (%R1) is to be loaded into the accumulator. Suppose that X references the address 0x0660 and that register %R1 contains 0x0040 – written as (%R1) = 0x0040. Then the effective address of the operand is 0x0660 + 0x0040 = 0x06A0. Remember, that 6 + 4 = A in hexadecimal. Page 16 of 22 Chapter 5 Revised July 7, 2005
An Example of the Addressing Modes We now consider an example to illustrate all of the addressing modes discussed above. To facilitate the example, we consider a truly bizarre architecture with one accumulator (the AC as in the MARIE architecture) and one additional register %R1. Suppose that register %R1 contains the value 100; (%R1) = 100 and that we have the following memory map, where Z references the memory location 200. To refresh our idea of notation, we recall that Z == 200 and M[Z] == 400. Thus Z refers to the address, not the contents of the address. This is somewhat different from the usage in high level languages, such as C++, where the expression w = z refers to the value at address z, which is stored at address w. In this example, we refer to the EA (effective address) – the address of the operand and the effect of the instruction – what value is actually stored in the accumulator. Note that the concept of effective address is not applicable to Immediate Addressing or Register Addressing, neither of which involves calculation of a memory address. Immediate Addressing LDI 200 EA is not applicable AC gets 200 Register Addressing LOAD %R1 // Load accumulator from %R EA is not applicable AC gets 100 – the value stored in %R Direct Addressing LOAD Z EA = 200 (the address Z) AC gets 400 – the value stored at address 200. Register Indirect Addressing LOAD @%R1 OK, the notation stinks. EA = 100 (the value stored in %R1) AC gets 250 – the value stored at address 100 Indirect Addressing LOAD @Z EA = 400 (the value of M[200]) AC gets 750 – the value stored at address 400. Recall that M[Z], the value stored at address Z is not the same as Z, the address. This distinction is often lost in higher level languages, where we think of variables only. Indexed Addressing LOAD Z, %R EA = Z + (%R1) = 200 + 100 = 300. Do not confuse Z (the address) with M[Z]. The AC gets 650, the value stored at address 300. Page 17 of 22 Chapter 5 Revised July 7, 2005 Address Contents 100 250 200 400 300 650 400 750
In the sample single accumulator machine we have Input Device_Id // Read from the indicated device, place data into accumulator Output Device_Id // Write to the indicated device from the accumulator These two instructions are distinct from the Load and Store instructions used as follows Load X // Load the accumulator from address X, and Store X // Store the accumulator contents into address X. The RISC solution uses load and store instructions to transfer data to and from registers. ld [Device_Address] , %r1 // Copy input into register %r st %r1, [Device_Address] // Copy contents of register to output device. As indicated above, we shall return to the study of I/O systems. At present, we focus on characterizing the two major strategies. Isolated I/O dedicated I/O instructions dedicated I/O bus Memory-Mapped I/O no I/O instructions single bus with address allocation to memory and I/O devices. Low Memory Many computer systems dedicate low memory, defined as the memory with the lowest addresses, to system usage. This has many advantages. In systems supporting C and C++ programming, it is common to dedicate the first 512 (2^9 ) locations, at addresses 0 – 511, to the operating system. It is claimed that this allows the run- time system to detect some common programming problems. The Program Status Register (aka Processor Status Register) We close this part of the discussion by mentioning yet another register – the PSR or Program Status Register. This register contains a number of fields, including bits to indicate the status of the last arithmetic operation N – the result of the last operation was negative Z – the result of the last operation was zero V – the last arithmetic operation caused an overflow program priority level – used to manage I/O operations program privilege level – is the program running at Super-User level? In general, any user program can read this register, but only programs with super-user privileges can change its contents. Page 19 of 22 Chapter 5 Revised July 7, 2005
Users and Super-Users In a typical multi-user machine, the average users are restricted in their access to system resources. While this restriction has many benefits for system security, its original purpose was to prevent users from interfering with each other accidentally. Consider a shared printer attached to a computer with two or more users attempting to print text copy. If each user’s program could access the printer directly, we would see the output intermixed – one line from user A followed by a number from user B followed by lines from user A, etc. The solution devised by operating system designers was to allow certain programs to run with complete system privileges. For the printer, the program is the print spooler. When normal user programs attempt to print, they are redirected to writing to a disk file. When the user program completes printing, its disk file is closed and sent to the spooler, which does the actual printing. As only the spooler accesses the printer directly there is no conflict. Programs with complete access to system resources are often said to run in super-user mode. As indicated above, the CPU uses specific bits in the PSR to indicate what mode it is running in. It should be clear that only a super-user program can change the PSR, otherwise any user program could elevate its status to super-user by changing the PSR. In fact, back- door ways to change the PSR form one of the more common classes of hacking tricks. Typically, the PSR is an 8-bit or a 16-bit register in which the individual bits have dedicated functions. We have named some of the bits on the previous page. The figure below shows a possible layout for a PSR, taken from one of this author’s CPU designs. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 TBD CPU Privilege CPU Priority Reserved I V Z N C The bit meanings are as follows: C the last arithmetic operation set the carry bit N the last arithmetic operation produced a negative result Z the last arithmetic operation produced a zero result V the last arithmetic operation produced an overflow I I/O devices are enabled to raise an interrupt – a signal that the device is ready to process an I/O transfer. We discuss these later. Interrupts are usually disabled when the interrupt handler is running. Reserved additional bits required to be 000 which can be redefined in a future design. Priority the running priority of the CPU. Each I/O device also has a priority. The I/O system handles the device if its priority is higher than the CPU priority. We discuss CPU priority when discussing I/O devices. Privilege The user mode for the CPU. All standard user programs will operate in mode 000 – no privilege. Page 20 of 22 Chapter 5 Revised July 7, 2005