The Instruction Set Architecture - Lecture Notes | CPSC 2105, Study notes of Computer Architecture and Organization

Material Type: Notes; Class: Computer Organization; Subject: Computer Science; University: Columbus State University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-7rq
koofers-user-7rq 🇺🇸

10 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The Instruction Set Architecture (ISA)
The instruction set architecture (ISA) of a computer is the structure of the computer as seen
by an assembly language programmer. In this chapter, we look at the computer hardware as
seen at the assembly language level, discuss addressing modes, and briefly discuss assembly
language. We shall then present and discuss a very simple assembly language.
We specify that the computer to be studied is a stored program computer, as are all modern
computers. Such a computer executes a program that has been previously stored in the
computer’s memory system, perhaps having been copied in from the disk. Only very early
computers, such as the ENIAC (1945), are not classified as stored program computers. The
program for the ENIAC was specified by a set of switches on one of its panel; in this design
the memory and registers stored only data. Some other early machines executed programs
directly read from punch cards and not stored in memory. At this point we insist that if the
machine is not a stored program computer, it is ancient history and not to be studied.
At the top level, most computers have three major components: the CPU (Central Processing
Unit), Memory, and the I/O (Input/Output) system. The CPU contains a number of major
subcomponents: the ALU (Arithmetic Logic Unit) that does the arithmetic, the Control Unit
that controls the execution of the assembly language, and the register set that provides fast
temporary storage for the CPU. A closer look at the register set shows that some of the
registers are visible to the assembly language programmer and some of the registers are for
use by the control unit only and are not accessible to the programmer. Common examples of
the latter set are the MAR (Memory Address Register) and MBR (Memory Buffer
Register), which contain the address and contents of the specified memory location.
Before discussing memory design and the use of these two registers, we consider the
historical development of the computer as driven by the economic issues associated with the
manufacture of computers, specifically the cost of both primary memory and disk memory.
RISC vs. CISC Computers
The first issue in the definition of the ISA of a computer is whether it is to be a RISC
machine or a CISC machine. The definitions are as follows:
RISC Reduced Instruction Set Computer
CISC Complex Instruction Set Computer.
The definition of CISC architecture is very simple – it is any design that is not a RISC
architecture. We now define RISC architecture and give some history of its evolution.
It should be noted that, with the exception of some early research models (such as the
Berkeley RISC I and RISC II and the IBM 801), there are no “pure RISC” computers. The
RISC approach is a design philosophy that has only recently become attractive. It is to be
expected that most future CPU designs will incorporate a number of RISC features, but retain
many of the CISC components that have made these designs popular.
Page 1 of 22 Chapter 5 Revised July 7, 2005
Copyright © 2005 by Edward L. Bosworth, Ph.D. All rights reserved.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download The Instruction Set Architecture - Lecture Notes | CPSC 2105 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

The Instruction Set Architecture (ISA) The instruction set architecture (ISA) of a computer is the structure of the computer as seen by an assembly language programmer. In this chapter, we look at the computer hardware as seen at the assembly language level, discuss addressing modes, and briefly discuss assembly language. We shall then present and discuss a very simple assembly language. We specify that the computer to be studied is a stored program computer , as are all modern computers. Such a computer executes a program that has been previously stored in the computer’s memory system, perhaps having been copied in from the disk. Only very early computers, such as the ENIAC (1945), are not classified as stored program computers. The program for the ENIAC was specified by a set of switches on one of its panel; in this design the memory and registers stored only data. Some other early machines executed programs directly read from punch cards and not stored in memory. At this point we insist that if the machine is not a stored program computer, it is ancient history and not to be studied. At the top level, most computers have three major components: the CPU (Central Processing Unit), Memory, and the I/O (Input/Output) system. The CPU contains a number of major subcomponents: the ALU (Arithmetic Logic Unit) that does the arithmetic, the Control Unit that controls the execution of the assembly language, and the register set that provides fast temporary storage for the CPU. A closer look at the register set shows that some of the registers are visible to the assembly language programmer and some of the registers are for use by the control unit only and are not accessible to the programmer. Common examples of the latter set are the MAR (Memory Address Register) and MBR (Memory Buffer Register) , which contain the address and contents of the specified memory location. Before discussing memory design and the use of these two registers, we consider the historical development of the computer as driven by the economic issues associated with the manufacture of computers, specifically the cost of both primary memory and disk memory. RISC vs. CISC Computers The first issue in the definition of the ISA of a computer is whether it is to be a RISC machine or a CISC machine. The definitions are as follows: RISC R educed I nstruction S et C omputer CISC C omplex I nstruction S et C omputer. The definition of CISC architecture is very simple – it is any design that is not a RISC architecture. We now define RISC architecture and give some history of its evolution. It should be noted that, with the exception of some early research models (such as the Berkeley RISC I and RISC II and the IBM 801), there are no “pure RISC” computers. The RISC approach is a design philosophy that has only recently become attractive. It is to be expected that most future CPU designs will incorporate a number of RISC features, but retain many of the CISC components that have made these designs popular. Page 1 of 22 Chapter 5 Revised July 7, 2005

One should note that while the name “RISC” is of fairly recent origin (dating to the late 1970’s) the concept can be traced to the work of Seymour Cray, then of Control Data Corporation, on the CDC-6400 and related machines. Mr. Cray did not think in terms of a reduced instruction set, but in terms of a very fast computer with a well-defined purpose – to solve complex mathematical simulations. The resulting design supported only two basic data types (integers and real numbers) and had a very simple, but powerful, instruction set. Looking back at the design of this computer, we see that the CDC-6400 could have been called a RISC design. As we shall see just below, the entire RISC vs. CISC evolution is driven by the desire to obtain maximum performance from a computer at a reasonable price. Mr. Cray’s machines maximized performance by limiting the domain of the problems they would solve. The general characteristic of a CISC architecture is the emphasis on doing more with each instruction. This may involve complex instructions and complex addressing modes; for example the MC68020 processor supports 25 addressing modes. For example, we give the two Pentium code listings as featured in chapter 4 of our textbook. The first example uses simple constructs, the second the more complex constructs that yield more compact code. Note that the first code fragment uses 9 statements, while the second uses only 5. LEA EBX, Num MOV ECX, Num MOV EAX, 0 MOV EDI, 0 Start: ADD EAX, [EBX + EDI + 4] INC EDI DEC ECX JG Start MOV Sum, EAX MOV ECX, Num Start: ADD EAX, [EBX + EDI + 4] INC EDI LOOP Start MOV Sum, EAX The ability to do more with each instruction allows more operations to be compressed into the same program size, something very desirable if memory costs are high. Some historical data will illustrate the memory issue. Time Cost of memory Cost of disk drive Introduction of MC6800 $500 for 16KB RAM $55,000 for 40 MB Introduction of MC68000 $200 for 64 KB RAM $5,000 for 10 MB Micron (4/10/2002) $49 for 128 MB RAM $149 for 20 GB Gateway (7/7/2005) $130 for 512 MB RAM $120 for 180 GB with 8MB cache. Page 2 of 22 Chapter 5 Revised July 7, 2005

Thus we are facing the facts that the more complex instruction sets are not necessary and that dropping the ability to support them will yield a faster CPU. There are other factors that favor the RISC architecture, specifically the fact that speed-up techniques such as instruction pre-fetching and instruction pipelining are more easily achieved for simple instructions. The philosophy behind the RISC approach is well described in an IEEE tutorial. Here we pick up on a narrative by a design engineer who worked on the IBM 801 project. “About this point, several people, including those who had been working on microprogramming tools, began to rethink the architectural design principles of the 1970’s. In trying to close the ‘semantic gap’, these principles had actually introduced a ’performance gap’. The attempt to bridge the gap with WCS’s [Writable Control Stores – microprogrammed control units] was unsuccessful.” “A new computer design philosophy evolved: Optimizing compilers could be used to compile ‘normal’ programming languages down to instructions that were as unencumbered as microinstructions in a large virtual address space, and to make the instruction cycle time as fast as the technology would allow. These machines would have fewer instructions – a reduced set – and the remaining instructions would be simple and would generally execute in one cycle – reduced instructions – hence the name reduced instruction set computers (RISC’s). RISC’s inaugurated a new set of architectural design principles.

  1. Functions should be kept simple unless there is a very good reason to do otherwise.
  2. Microinstructions should not be faster than simple instructions.
  3. Microcode is not magic.
  4. Simple decoding and pipelined execution are more important than program size.
  5. Compiler technology should be used to simplify instructions rather than to generate complex instructions.” The narrative from the tutorial continues with remarks on the RISC architectures developed at the University of California at Berkeley. “Although each project [the Berkeley RISC I and RISC II and the IBM 801] had different constraints and goals, the machines they eventually created have a great deal in common.
  6. Operations are register-to-register, with only LOAD and STORE accessing memory.
  7. The operations and addressing modes are reduced. Operations between registers complete in one cycle, permitting a simpler, hardwired control for each RISC, instead of microcode. Multiple-cycle instructions such as floating-point arithmetic are either executed in software or in a special-purpose processor. (Without a coprocessor, RISC’s have mediocre floating-point performance.) Only two simple addressing modes, indexed and PC-relative, are provided. More complicated addressing modes can be synthesized from the simple ones.
  8. Instruction formats are simple and do not cross word boundaries. This restriction allows RISC’s to remove instruction decoding time from the critical execution path. … RISC register operands are always in the same place in the 32-bit word, so register access can take place simultaneously with opcode decoding. This removes the instruction decoding stage from the pipelined execution, making it more effective by shortening the pipeline.” Page 4 of 22 Chapter 5 Revised July 7, 2005

There are a number of other advantages to the RISC architecture. We list a few Better Access to Memory Better Support of Compilers According to an IEEE Tutorial “Register-oriented architectures have significantly lower data memory bandwidth. Lower data memory bandwidth is highly desirable since data access is less predictable than instruction access and can cause more performance problems.” We note that, even at 6.4 GB/second data transfer rates, access to memory is still a bottleneck in modern computer design, so any design that reduces the requirement for memory access (here called reducing the memory bandwidth) would be advantageous. Better Support of Compilers According to an IEEE Tutorial “The load/store nature of these [existing RISC] architectures is very suitable for effective register allocation by the compiler; furthermore, each eliminated memory reference results in saving an entire instruction.” Compilers attempt to increase the efficiency of the run–time code by storing variable contents in CPU registers for later access, thus avoiding another memory reference when the variable is used again. This part of the semantic analysis done in the compilation stage is called “ register allocation ”. The RISC designs tend to facilitate this stage in compilation by having more registers to allocate. More efficient register allocation by a compiler will usually result in faster-running code. We see this as another advantage of the RISC design. Implications for the Control Unit The complex instructions in a CISC computer tend to require more support in the execution than can conveniently be provided by a hardwired control unit. For this reason, most CISC computers are microprogrammed to handle the complexity of each of the instructions. For this reason, most CISC instructions require a number of system clock cycles to execute. The RISC approach emphasizes use of a simpler instruction set that can easily be supported by a hardwired control unit. As a side effect, most RISC instructions can be executed in one clock cycle. A given computer program will compile into more RISC instructions than CISC instructions, but the CISC instructions execute more slowly than the RISC instructions. The overall effect on the computer program may be hard to predict. According to an IEEE tutorial “Reducing the instruction set further reduces the work a RISC processor has to do. Since RISC has fewer types of instructions than CISC, a RISC instruction requires less processing logic to interpret than a CISC instruction. The effect of such simplification is to speed up the execution rate for RISC instructions. In a RISC implementation it is theoretically possible to execute an instruction each time the computer’s logic clock ticks. In practice the clock rate of a RISC processor is usually three times that of the instruction rate.” Page 5 of 22 Chapter 5 Revised July 7, 2005

Registers Registers are a form of computer memory. A register set (or register file ) is a set of storage locations built into the Central Processing Unit. Logically a register is the same as a word in the computer’s main memory. Physically, the register is implemented using a faster and more costly technology. The fact that the register set is almost always on the same chip as the CPU considerably speeds the time to access the registers. Registers are either general purpose or special purpose. A general purpose register can be used for a number of tasks, including data storage and address storage. A special purpose register has a limited use; the MAR is a special purpose register used only to store memory addresses. When we discuss the Instruction Set Architecture of a computer, we normally focus on only those registers accessible to the assembly language; these tend to be general purpose registers. Due to cost restrictions, some early computers had only one general purpose register, called an accumulator because it was used to accumulate the results of computation. This register was first seen on adding machines and now is seen, not by that name, on pocket calculators. When we enter a series of numbers and mathematical operations, the display of the calculator continually updates the result. The accumulator in a single register CPU functions similarly. Some Special Purpose Registers We have already discussed two special purpose registers that are not part of the ISA, the MAR Memory Address Register, and the MBR Memory Buffer Register. A stored program computer is based on the fetch / execute cycle in which an instruction from the stored program is fetched from the memory and then executed. This structure requires two additional registers, neither of which is normally accessible to the programmer. PC the Program Counter containing the address of the instruction to execute. In some architectures, such as the PDP-11 and VAX, the PC is accessible. On the Intel 80x86, this is called the IP (Instruction Pointer). IR the Instruction Register containing the instruction that has been fetched. We can characterize the fetch cycle in code that looks like C++, except that it is upper case. IR = M[PC++] // the memory word at the address is read into the IR // and the PC is incremented after the read. In the course on computer architecture we see that the PC is incremented after the instruction is read from memory because of two facts

  1. the most common case is to execute instructions one after another
  2. there is usually nothing else that can be done in that clock pulse. Many modern computers contain another register. SP the Stack Pointer. This register contains the address of the top of the system stack. Most commonly, this is a general purpose register assigned a special use. In general, this register is accessible by the assembly language. Page 7 of 22 Chapter 5 Revised July 7, 2005

As an abstract data type (ADT), the stack is defined as a LIFO (Last In, First Out) data structure with the operations PUSH and POP. For this course we consider the stack as an array of memory, with the push and pop operations defined as follows. PUSH M[SP++] = Input POP Output = M[--SP] A Side Remark on Secure Software The following C++ function illustrates the common way to implement a stack of integers. int pop () { int x ; x = m[--sp] ; return x ; } This common practice leaves the value on the stack, under the assumption that it will be overwritten at some point in the future execution of the program. With all of the present concern about stack smashing and other techniques used by hackers, we may prefer to clear the stack entry before returning the value. This may sound paranoid, but here is the code. int pop () // Array m is declared globally { // Ignore index checking int x ; x = m[--sp] ; // Get the value m[sp] = 0 ; // Then clear the entry return x ; } Characterization of the CPU by Number of Registers One way to characterize the ISA of a computer is the number of general purpose registers available in the CPU. The broad categories are “one”, “few”, and “many”. Older computers, such as one design to be discussed below, use only one register, often called the accumulator , because of the expense of registers in the early design. Later variants of the early computers used two registers, called ACC and MQ to support hardware implementation of multiplication and division. To see why this should be so, consider the PDP-9 (a computer already obsolete by 1975) which had 18-bit words. From earlier discussions of computer arithmetic, we can infer that a single 18-bit accumulator would suffice to support both addition and subtraction of 18-bit integers. Consider the product of two 18-bit integers. It is easily seen that the product could be 36 bits in length. To convince oneself of this, consider the product of two integers of two digits each: 9999 = 9801, a four digit number. Similarly, the largest unsigned 18-bit integer is 2^18 – 1 and (2^18 – 1)(2^18 – 1) = 2^36 – 2 218 + 1 = 2^36 – 2^19 + 1, which takes 36 bits to represent. Page 8 of 22 Chapter 5 Revised July 7, 2005

We now consider the implementation of the expression A = B + C in the assembly language of a load-store RISC machine. Recall that the load-store design restricts access to memory to only two instructions – register load and register store. Arithmetic operations, such as the ADD discusses above, can only be done between registers. There are a number of reasons for this decision, alluded to above but repeated here.

  1. There is only one kind of ADD instruction – add register to register. If we allowed memory to be an argument, we would have to support at least two instructions – add register to register, add memory to register, and possibly add memory to memory. Due to memory addressing requirements, the instructions would have different length.
  2. Register to register addition is much faster than addition directly involving a memory address. Little time is lost in loading the registers before using them. Some reflection will convince the student that a Load-Store RISC computer must have at least three general purpose registers. The reasoning is simple. Since only load and store operations can access the memory, operations such as the ADD can access only registers. A binary addition requires three registers, one for each of the numbers being added and one for the sum. In general RISC computers tend to have at least 32 general purpose registers. The same expression would be translated in a load-store RISC as follows. ld c, %r3 // load the value at address c into register 3 ld b, %r2 // load the value at address b into register 2 add %r2, %r3, %r1 // set register 1 to the sum st %r1, a // store contents of register 1 into address a. In the above example we use a specific notation to distinguish register references from memory references. Other assemblers use different conventions to do this. In this assembler any reference to r3 would be taken as a reference to a variable with that name, rather than the register. This is not a problem in higher level languages, which do not access registers directly. We close this section with an open question on the best use of the nanoacres of chip space available to designers – do we place more registers on the chip or more L1 cache memory? The answer depends on the design goals of the computer. Nanoacre Defined An acre is a unit of area used by surveyors. By definition, 1 acre = 43560 square feet. As one square foot is exactly 0.09290304 square meters, 1 acre = 4046.85 square meters. The joke unit nanoacre would then be 10-9^ acre = 4046.85 10 -9^ meters^2 = 4.04685 10 -6^ meters^2 = (2.0117  10 -3^ meter)^2 , which is a square about two millimeters on a side. This is approximately the size of the CPU chip on many modern processors. Also, a milliacre is a square about 2 meters on a side, a bit smaller than an office cubicle. Page 10 of 22 Chapter 5 Revised July 7, 2005

Another Example: The Number of Operands We now consider another aspect of the design of an instruction set: how many arguments should be used. Consider the algebraic expression Z = X + Y. This is a common example of addition – two numbers are added together. There are three arguments implied – the two numbers to be added and the variable to hold the sum. In the examples below, we consider a variety of register options. The following assembly language notations will be used. AC In a single accumulator machine (such as the MARIE), the token AC will refer to the single accumulator. Other letters are used for variable names. %R1 In a machine with a number of registers, we prefix a register name with the “%” character to make the name distinct from a variable name. The question in design of the assembly language is how to implement operations such as the simple addition shown above. There are four possibilities, each of which is shown below.

1. Opcode + 3 arguments In this situation, the expression Z = X + Y will translate into something such as ADD Z, X, Y or ADD X, Y, Z – depending on what the argument order implies. In some computers, any of the arguments could be registers. In a RISC machine with a load-store architecture , all arguments must be registers.

  1. Opcode + 2 arguments In this situation, we might have something like ADD X, Y – translating into the mathematical expression X = X + Y. Again, one or both of the arguments could be a register.
  2. Opcode + 1 argument This usually applies to a single accumulator machine, such as the MARIE. The implied other argument is the accumulator, thus the expression ADD X stands for AC = AC + X.
  3. Opcode only This applies to stack-based architectures. In these architectures, which are not common, we have two special operators. Push X place the data value onto the stack Pop X pop the stack and place the data value into X. All other operations use the top members of the stack, implicitly executing a Pop to get the argument and a Push to place the results on top of the stack. Rather than spend time on these definitions, we shall consider an example from the textbook. We examine four possible translations of the following expression into assembly language. Z = (X * Y) + (W * U) // * is the standard multiplication operator. Page 11 of 22 Chapter 5 Revised July 7, 2005

Zero-Argument Implementation Again, we must remind ourselves that not all operations on a stack-based architecture lack operands. There must be operations to place arguments onto the stack and to remove them. As above, we postulate two basic one-argument operations Push X // Place the value of the location X onto the top of the stack // and increment the stack pointer. Pop X // Decrement the stack pointer and copy the value from the top // of the stack into location X. Binary operations, such as addition and multiplication, remove the top two stack values to be used as operands and place the result on the top of the stack. Unary operations, such as negation, remove only one value from the stack and then replace it with the result. PUSH X PUSH Y MULT PUSH W PUSH U MULT ADD POP Z At the end of this sequence of operations, the stack is back to its original state and the computed result (XY + WU) is stored in variable Z. Page 13 of 22 Chapter 5 Revised July 7, 2005

Diversion: The Intel 8088 Register Set The main reason for this section is the fact that most of us are now working on WINTEL machines (running W indows on top of an Intel processor). The registers in the Intel 8088 are divided into four sets: the Common Registers, the Segment Registers (mentioned earlier), the Instruction Pointer, and the Flag Register. All of these registers are 16-bit registers, though some can be treated as two 8-bit registers. The Common Registers are AX the Accumulator BX the Base Register CX the Count Register DX the Data Register DI the Destination Index SI the Source Index SP the Stack Pointer BP the Base Pointer The Segment Registers are DS the Data Segment CS the Code Segment SS the Stack Segment ES the Extra Segment Each of the AX, BX, CX, and DX registers has two associated 8-bit registers – a “high” register containing the upper 8 bits of the register and a “low” register with the low 8 bits. AX the registers are AH and AL BX the registers are BH and BL CX the registers are CH and CL DX the registers are DH and DL. The register sets are tied together. Changing the value of the AX register might change the value of both the AH and AL registers. Changing the value of either the AH or AL registers will change the value in the AX register. The formula is AX = AH  256 + AL. The Intel 80386 and more recent have expanded the general purpose registers to 32 bits; thus we have the EAX, EBX, ECX, and EDX registers. We use the AX register as an example of the relationship between the 32-bit, 16-bit, and 8-bit registers. Bits 31 - 16 Bits 15 - 8 Bits 7 - 0 EAX AX AH AL Page 14 of 22 Chapter 5 Revised July 7, 2005

Register Addressing In register addressing, a register is used for an operand. An example of register addressing would be the instruction LOAD %R2, %R1 in which the contents of register 2 are copied into register 1. Both arguments are registers in the CPU Direct Addressing This method of addressing specifies the address of the operand directly in the instruction. Suppose that variable X refers to memory location 0x660. The opcode for LOAD is 0001, so LOAD X assembles to the binary machine code 0001 0110 0110 0000, or 0x and we see that the last three hex digits contain the address X. In the MARIE architecture, this causes the contents of memory address X to be copied to the accumulator. We should note there that any instruction with more than one operand can use a mixed mode addressing. Consider the following instruction, not a part of the MARIE ISA. LOAD %R1, X // Register 1 gets the contents of location X This instruction uses two addressing modes: direct addressing for the source operand and register addressing for the destination operand. Such coding is quite common. Register Indirect Addressing In this addressing mode, the register does not hold the operand, but rather the address of the operand, which is located in memory. Suppose that the contents of register %R1 are 0x and consider the clumsily written instruction LOAD %R2, %R1, where the “” operator indicates indirection. The address 0x0660 would be found in register %R1 and the contents of memory location 0x0660 would be copied into register %R2. Indirect Addressing In this form of addressing, the memory location directly referenced in the instruction contains a pointer to the operand. Consider the instruction LOAD @X, where X is address 0x0660. The contents of this address contain the address of the operand. Suppose M[X] = 0x0ACE. Then the effective address of the operand is 0x0ACE and the contents of address 0x0ACE are copied into the accumulator. Indexed Addressing This form is somewhat similar to direct addressing except that a register is used to specify an offset to the direct address. This offset is similar to an index in an array. Thus we might have the instruction LOAD X, %R1 used to indicate that the operand at address X + (%R1) is to be loaded into the accumulator. Suppose that X references the address 0x0660 and that register %R1 contains 0x0040 – written as (%R1) = 0x0040. Then the effective address of the operand is 0x0660 + 0x0040 = 0x06A0. Remember, that 6 + 4 = A in hexadecimal. Page 16 of 22 Chapter 5 Revised July 7, 2005

An Example of the Addressing Modes We now consider an example to illustrate all of the addressing modes discussed above. To facilitate the example, we consider a truly bizarre architecture with one accumulator (the AC as in the MARIE architecture) and one additional register %R1. Suppose that register %R1 contains the value 100; (%R1) = 100 and that we have the following memory map, where Z references the memory location 200. To refresh our idea of notation, we recall that Z == 200 and M[Z] == 400. Thus Z refers to the address, not the contents of the address. This is somewhat different from the usage in high level languages, such as C++, where the expression w = z refers to the value at address z, which is stored at address w. In this example, we refer to the EA (effective address) – the address of the operand and the effect of the instruction – what value is actually stored in the accumulator. Note that the concept of effective address is not applicable to Immediate Addressing or Register Addressing, neither of which involves calculation of a memory address. Immediate Addressing LDI 200 EA is not applicable AC gets 200 Register Addressing LOAD %R1 // Load accumulator from %R EA is not applicable AC gets 100 – the value stored in %R Direct Addressing LOAD Z EA = 200 (the address Z) AC gets 400 – the value stored at address 200. Register Indirect Addressing LOAD @%R1 OK, the notation stinks. EA = 100 (the value stored in %R1) AC gets 250 – the value stored at address 100 Indirect Addressing LOAD @Z EA = 400 (the value of M[200]) AC gets 750 – the value stored at address 400. Recall that M[Z], the value stored at address Z is not the same as Z, the address. This distinction is often lost in higher level languages, where we think of variables only. Indexed Addressing LOAD Z, %R EA = Z + (%R1) = 200 + 100 = 300. Do not confuse Z (the address) with M[Z]. The AC gets 650, the value stored at address 300. Page 17 of 22 Chapter 5 Revised July 7, 2005 Address Contents 100 250 200 400 300 650 400 750

In the sample single accumulator machine we have Input Device_Id // Read from the indicated device, place data into accumulator Output Device_Id // Write to the indicated device from the accumulator These two instructions are distinct from the Load and Store instructions used as follows Load X // Load the accumulator from address X, and Store X // Store the accumulator contents into address X. The RISC solution uses load and store instructions to transfer data to and from registers. ld [Device_Address] , %r1 // Copy input into register %r st %r1, [Device_Address] // Copy contents of register to output device. As indicated above, we shall return to the study of I/O systems. At present, we focus on characterizing the two major strategies. Isolated I/O dedicated I/O instructions dedicated I/O bus Memory-Mapped I/O no I/O instructions single bus with address allocation to memory and I/O devices. Low Memory Many computer systems dedicate low memory, defined as the memory with the lowest addresses, to system usage. This has many advantages. In systems supporting C and C++ programming, it is common to dedicate the first 512 (2^9 ) locations, at addresses 0 – 511, to the operating system. It is claimed that this allows the run- time system to detect some common programming problems. The Program Status Register (aka Processor Status Register) We close this part of the discussion by mentioning yet another register – the PSR or Program Status Register. This register contains a number of fields, including bits to indicate the status of the last arithmetic operation N – the result of the last operation was negative Z – the result of the last operation was zero V – the last arithmetic operation caused an overflow program priority level – used to manage I/O operations program privilege level – is the program running at Super-User level? In general, any user program can read this register, but only programs with super-user privileges can change its contents. Page 19 of 22 Chapter 5 Revised July 7, 2005

Users and Super-Users In a typical multi-user machine, the average users are restricted in their access to system resources. While this restriction has many benefits for system security, its original purpose was to prevent users from interfering with each other accidentally. Consider a shared printer attached to a computer with two or more users attempting to print text copy. If each user’s program could access the printer directly, we would see the output intermixed – one line from user A followed by a number from user B followed by lines from user A, etc. The solution devised by operating system designers was to allow certain programs to run with complete system privileges. For the printer, the program is the print spooler. When normal user programs attempt to print, they are redirected to writing to a disk file. When the user program completes printing, its disk file is closed and sent to the spooler, which does the actual printing. As only the spooler accesses the printer directly there is no conflict. Programs with complete access to system resources are often said to run in super-user mode. As indicated above, the CPU uses specific bits in the PSR to indicate what mode it is running in. It should be clear that only a super-user program can change the PSR, otherwise any user program could elevate its status to super-user by changing the PSR. In fact, back- door ways to change the PSR form one of the more common classes of hacking tricks. Typically, the PSR is an 8-bit or a 16-bit register in which the individual bits have dedicated functions. We have named some of the bits on the previous page. The figure below shows a possible layout for a PSR, taken from one of this author’s CPU designs. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 TBD CPU Privilege CPU Priority Reserved I V Z N C The bit meanings are as follows: C the last arithmetic operation set the carry bit N the last arithmetic operation produced a negative result Z the last arithmetic operation produced a zero result V the last arithmetic operation produced an overflow I I/O devices are enabled to raise an interrupt – a signal that the device is ready to process an I/O transfer. We discuss these later. Interrupts are usually disabled when the interrupt handler is running. Reserved additional bits required to be 000 which can be redefined in a future design. Priority the running priority of the CPU. Each I/O device also has a priority. The I/O system handles the device if its priority is higher than the CPU priority. We discuss CPU priority when discussing I/O devices. Privilege The user mode for the CPU. All standard user programs will operate in mode 000 – no privilege. Page 20 of 22 Chapter 5 Revised July 7, 2005