





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Bosworth; Class: Computer Architecture; Subject: Computer Science; University: Columbus State University; Term: Fall 2009;
Typology: Study notes
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















We now focus on the detailed design of the CPU ( C entral P rocessing U nit) of the ASC. The CPU has two major components: the Control Unit and the ALU ( A rithmetic L ogic U nit). We first consider the control unit. Program Execution The program execution cycle is the basic Fetch / Execute cycle in which the 16-bit instruction is fetched from the memory and executed. This cycle is based on two registers: PC the Program Counter – a 16-bit address register IR the Instruction Register – a 16-bit data register. At the beginning of the instruction fetch cycle the PC contains the address of the instruction to be executed next. The fetch cycle begins by reading the memory at the address indicated by the PC and copying the memory into the IR. At this point, the PC is incremented by 1 to point to the next instruction. This is done due to the high probability that the instruction to be executed next is the instruction in the address that follows immediately; program jumps (BRU, BIP, etc.) are somewhat unusual and are handled in program execution. All instructions share a common beginning to the fetch sequence. The common fetch sequence is adapted to the relative speed of the CPU and memory. We assume that the access time of the memory unit is such that the memory contents are not available on the step following the memory read, but on the step after that. Here is the common fetch sequence. MAR PC send the address of the instruction to the memory Read Memory this causes MBR MAR[PC] PC PC + 1 cannot access memory, so might as well increment the PC IR MBR now the instruction is in the Instruction Register. When the instruction is in the IR, it is decoded and the common fetch sequence terminates. After this point, the execution sequence is specific to the instruction. This subsequent execution sequence includes calculation of the EA (Effective Address) for those instructions that take an operand. The next step in the design of the CPU is to specify the microoperations corresponding to the steps that must be executed in order for each of the assembly language instructions to be executed. Before considering these microoperations, we study several topics. the structure of the bus or busses internal to the CPU the functional requirements on the ALU We first consider the structure of the CPU internal busses.
CPU Internal Bus Structure We first consider the bus structure of the computer. Note that the computer has a number of busses at several levels. For example, there is a bus that connects the CPU to the memory unit and a bus that connects the CPU to the I/O devices. In addition to these important busses, there are often busses internal to the CPU, of which the programmer is usually unaware. We now consider the bus structure in light of the common fetch sequence. PC PC + 1 This microoperation represents the incrementing of the PC to point to the next instruction on the probability that the next instruction will be the next to be executed. Note that this one microoperation places a functional requirement on the ALU – it must implement an addition operation. We shall depart from the book’s notation and use the notation add to denote the ALU addition operation (and the control signal that causes that ALU operation) and the all uppercase ADD to denote the assembly language operation. At this point, we know that there must be at least one bus internal to the CPU so that the contents of the PC can be transferred to the ALU and the incremented value copied back to the PC. We consider a one bus solution and immediately notice a problem. The ALU must have two inputs for the add operation, one for the value of the PC and one for the value 1 used to increment the PC. If we use a single bus solution, we must allow for the fact that only one value at a time may be placed on the bus. We now present a design based on the single bus assumption. One design would add the complexity of an increment instruction for the ALU, but we avoid that and base our solution on the add operation only. We need a source of the constant 1, so we create a “1 register” to hold the number. We postulate a two input ALU with a register Z to hold the output. Since the bus can have only one value at a time, we must have a temporary register Y to hold one of the two inputs to the ALU. Here are the microoperations. CP1: 1 BUS, BUS Y CP2: PC BUS, add CP3: Z BUS, BUS PC We note that the single bus solution is rather slow. We would like another way to do this, preferably a faster one. The solution proposed by the book is to have three busses in the CPU, named BUS1, BUS2, and BUS3. With three busses, we can put one value on each of two busses that serve as input to the ALU and copy the results on the third bus, serving as input to the PC, as follows PC BUS1, 1 BUS2, add , BUS3 PC
We now consider the implication of the microoperation MAR PC. We have noted that the PC outputs to BUS1 and that BUS3 is used to transfer data to all registers. The only way for data to get to BUS3 from BUS1 is via the ALU, so we need an appropriate ALU primitive. We define the two ALU primitives for data transfer tra1 transfer the contents of BUS1 to BUS tra2 transfer the contents of BUS2 to BUS3. While we are at it, we assign the MAR to BUS1. Again, at this point, this is a bit arbitrary. We now examine the last microoperation IR MBR. We assign the MBR to BUS2, thus requiring the tra2 primitive, already defined. At this point, we review what we have discovered from these four microoperations by converting them to control signals. MAR PC PC BUS1, tra1 , BUS3 MAR Read Memory Read Memory PC PC + 1 PC BUS1, 1 BUS2, add , BUS3 PC IR MBR MBR BUS2, tra2 , BUS3 IR We now look at the CPU design as it has evolved at this time in response to the requirements to support the Common Fetch sequence. Please remember that this is not a complete description, but only a description of what we have discussed to this point. The ALU output bus, BUS3, feeds all of the registers with the exception of the constant register +1. We have a somewhat uneven division of the ALU input busses, but this will be fixed later. Two Addressing Modes: Direct and Indexed We shall soon consider all four addressing modes. For now, we consider the impact of two of the addressing modes on the CPU design. Recall that the address part of the instruction occupies the lower 8 bits: bits 7 through 0 inclusive. When a 1-operand instruction is copied into the IR (Instruction Register), the address part is IR7–0. In direct addressing, this is the address to use. In indexed addressing, the address to use is IR7–0 + IX, where IX denotes the contents of the specified address register. These addresses go to the MAR, thus Direct Addressing MAR IR7– Indexed Addressing MAR IR7–0 + IX
The net result of these two addressing modes is that we need to connect the three index registers (IX1, IX2, and IX3) to BUS2 so that the contents of the indicated Index Register can be added to the lower order 8 bits of the IR. At this point, we mention the IX0 trick – actually a standard design practice. Consider the above description of the two addressing modes, slightly rewritten. IR 9 IR 8 = 00 MAR IR7–0 + 0 IR 9 IR 8 = 01 MAR IR7–0 + IX IR 9 IR 8 = 10 MAR IR7–0 + IX IR 9 IR 8 = 11 MAR IR7–0 + IX The trick is to define register IX0 as a constant register containing the constant 0. With this constraint, the microoperation MAR IR7–0 + IX0 is the same as MAR IR7–0 + 0. The advantage of this trick is that the control unit is simplified – always a good thing. We now show the state of the CPU design as reflecting the latest requirements. Note the top index register (IX0) is shown filled with 0’s to indicate that it is really the constant 0. Since IX0 is a constant register, it has no connection to BUS3, the only function of which is to put data into registers. We should not leave this without making a brief statement on how the index registers are connected to the busses so that we can use the control signals IX BUS2 and BUS3 IX. The index registers are connected to BUS through a 4-to-1 MUX. The control signal IX BUS2 causes the output of this MUX to be placed on BUS2. BUS3 is connected to the three writable index registers through a 1-to- DEMUX operating in the expected fashion when the control signal BUS3 IX is active.
ADD The key micro-operation to be executed is ACC + MBR ACC. This immediately requires that the ACC feed a bus different from the one used by the MBR. Since the MBR has been assigned to BUS2, the ACC must be assigned to BUS1 (as done above). ALU primitives add (already specified) TCA The key micro-operation to be executed is – ACC ACC. We need a new ALU primitive; we have the choice tca or comp. Implementation of the two’s-complement as an ALU primitive tca would yield a speed boost but create problems in the ALU design, so we opt to have the ALU primitive be comp – for one’s-complement. This will cause the TCA instruction to be executed as a two-step operation: store the one’s-complement in the ACC and then add 1 to it. We have a +1 register attached to BUS2 and an add primitive defined. ALU primitives comp SHL This requires the ALU primitive shl. SHR This requires the ALU primitive shr. TIX As the index registers feed bus 2, this requires a +1 constant register to feed bus 1. TDX As the index registers feed bus 2, this requires a –1 constant register to feed bus 1. Indexed Addressing Revisited We now address the impact of the IX0 design choice. Recall the claim that this design has reduced the number of addressing modes to two: Indexed and Indexed-Indirect, so that Direct Addressing becomes Indexed by IX0, identically 0. Indirect Addressing becomes Indexed by IX0, then indirect Indexed Addressing becomes Indexed by a real index register Indexed-Indirect Addressing becomes Indexed by a real index register, then indirect We must now face the problem that execution of op-codes C, D, E, and F does not allow for indexed addressing. So we need a new criterion for choosing when to index (possibly with IX0) and when not to index. The answer is seen in the structure of the op-codes. If IR15-14 = 10 addressing is not used, so there is no indexed addressing If IR15-14 = 11 indexed addressing is not used. The design choice is to inhibit indexing if IR 15 = 1. If IR 15 = 0, the design uses indexed addressing, possibly with IX0 – leading to direct or indirect addressing. If IR 15 = 1, only direct or indirect addressing are possible. This approach simplifies the design. We now specify a design requirement for the common fetch cycle: an address is to be deposited in the MAR. This address will be the Effective Address unless indirect addressing is also used. Thus, we specify a common fetch sequence for all one-operand instructions.
Common Fetch Sequence for One-Operand Instructions CP1: PC BUS1, tra1 , BUS3 MAR, READ. CP2: PC BUS1, 1 BUS2, add , BUS3 PC CP3: MBR BUS2, tra2 , BUS3 IR CP4: If IR 15 = 0 Then IR7-0 BUS1, IX BUS2, add , BUS3 MAR Else IR7-0 BUS1, tra1 , BUS3 MAR. The zero-operand instructions will replace the CP4 step with an appropriate action. The Defer Cycle The defer cycle handles indirect addressing. If indirect addressing is used, then on entry to the defer cycle the value in the MAR is the address of a pointer to the argument. On exit, the argument address must be in the MAR. Here is the complete defer sequence. CP1: READ The address is in the MAR CP2: Wait Cannot do anything here CP3: MBR BUS2, tra2 , BUS3 MAR Now the argument address in the MAR CP4: Wait Major Cycles and Minor Cycles We now introduce the concept of major and minor cycles. In the first CPU design, we focus on three major cycles: Fetch, Defer, and Execute. In this design, each major cycle has four minor cycles. In another design, the Fetch, Defer, and Execute cycles are seen as phases in the execution of an instruction. The minor cycle is tied to the CPU clock rate. A minor cycle can contain only micro- operations that can be executed at the same time and complete in one clock time. Suppose that the ASC is a 10 MHz machine. The clock time is then 0.1 microsecond; this is the time duration for a minor cycle. Our design calls for the ASC to wait for one clock cycle between asserting an address in the MAR and retrieving data from the MBR; this corresponds to a memory access time of 100 nanoseconds – a reasonable value. We now examine two implementations of the control unit, one using only standard combinational circuits (“hardwired” control unit) and then a micro-programmed CPU. The basis for each of these designs is the complete sequence of control signals for each assembly language instruction. For convenience, we give the control signals within the context of a hardwired control unit. Note that the defer cycle is always listed for one-operand instructions even when indirect addressing is not used. If indirect addressing is not used, the defer cycle is not entered. No indirect addressing Fetch – Execute Indirect addressing used Fetch – Defer – Execute
4 TCA (Negate the Accumulator using the Two’s-Complement) F,CP1: PC BUS1, tra1 , BUS3 MAR, READ. F,CP2: PC BUS1, 1 BUS2, add , BUS3 PC F,CP3: MBR BUS2, tra2 , BUS3 IR F,CP4: Wait Defer Cycle is not entered. E,CP1: ACC BUS1, comp , BUS3 ACC E,CP2: ACC BUS1, 1 BUS2, add , BUS3 ACC E,CP3: Wait E,CP4: Wait 5 BRU (Unconditional Branch) Fetch – as in LDA Defer – if entered, the same as LDA E,CP1: MAR BUS1, tra1 , BUS3 PC E,CP2: Wait E,CP3: Wait E,CP4: Wait 6 BIP (Branch if the Accumulator is Positive) Fetch – as in LDA Defer – if entered, the same as LDA E,CP1: If ACC > 0 Then MAR BUS1, tra1 , BUS3 PC E,CP2: Wait E,CP3: Wait E,CP4: Wait 7 BIN (Branch if the Accumulator is Negative) Fetch – as in LDA Defer – if entered, the same as LDA E,CP1: If ACC < 0 Then MAR BUS1, tra1 , BUS3 PC E,CP2: Wait E,CP3: Wait E,CP4: Wait
Before giving the control signals for the RWD and WWD assembly language instructions, we first must define the “handshake” protocols used in the I/O system. The I/O system uses three flip-flops: DATA, INPUT, and OUTPUT. The uses of these flip-flops are as follows: DATA indicates the presence of data to be transferred INPUT commands the input data to place data on the DIL (data input lines) OUTPUT commands the output device to take data from the DOL (data output lines) The RWD protocol is as follows
The next two instructions levy new requirements on the CPU. The index registers are connected to BUS2. We need to increment and decrement the registers using the ALU, so we need two additional registers, +1 and –1, both connected to BUS1. E TIX (Increment and Test the Accumulator, Branch Conditionally) Fetch – same as LDX Defer – same as LDX E,CP1: +1 BUS1, IX BUS2, add , BUS3 IX E,CP2: If IX = 0 Then MAR BUS1, tra1 , BUS3 PC E,CP3: Wait E,CP4: Wait F TDX (Decrement and Test the Accumulator, Branch Conditionally) Fetch – same as LDX Defer – same as LDX E,CP1: –1 BUS1, IX BUS2, add , BUS3 IX E,CP2: If IX 0 Then MAR BUS1, tra1 , BUS3 PC E,CP3: Wait E,CP4: Wait Design of a Hardwired Control Unit We now address the issues associated with the design of a hardwired control unit. We shall study the following issues:
The instruction decoder takes the four high order bits of the IR (Instruction Register). The output of the 4-to-16 decoder is one of 16 control signals, each identified by the name of the assembly language instruction with which the control signal is associated. For simplicity we do not consider the extended instruction set as does the figure on page 238. For example, consider the STX instruction. The Opcode for this instruction is IR15-12 = 1101. This is input to the decoder, causing the output 13 to become active and assert the signal STX, which causes the logic for the STX assembly language instruction to be executed. The Indirect control signal is the output of IR 10 , bit 10 of the Instruction Register. We have already discussed the use of IR9-8 for index register selection logic. Design of the Major State Register In order to design the major state register we first must note the sequence of major states corresponding to the execution of the assembly language instructions. We have seen that there are three possible sequences Fetch only Fetch – Execute Fetch – Defer – Execute The major state register is a sequential circuit that produces the sequence of major states. For example, if the current state is Fetch, what is to be the next major state. If the instruction can execute completely in the Fetch state, the next state will also be Fetch as the next instruction is fetched and its execution begun. If the instruction cannot be completed in the Fetch state (as most cannot), the question is whether or not to enter the Defer state. The next consideration arises from the two I/O instructions RWD and WWD. The strategy will be to hold these instructions in Execute until the I/O transaction is complete. To facilitate this and the previous considerations in the design of the Major State Register, we define three signals I 1 if in Fetch, stay in Fetch (and fetch the next instruction) I 2 if in Fetch, go to Defer (and handle indirect addressing) I 3 if in Execute, stay in Execute until the I/O is complete.
Signal Generation Tree We now have the three major parts of circuits required to generate the control signals.
The microinstruction for the generic Fetch, CP4 is as follows. F, CP4: If IR 15 = 0 Then IR7-0 BUS1, IX BUS2, add , BUS3 MAR Else IR7-0 BUS1, tra1 , BUS3 MAR. We shall use this form of the instruction primarily as an illustration of the handling of IF THEN ELSE conditions in a signal control tree. After this has been done, we display the signal generation tree for Fetch, CP4 as actually found on the ASC. Here is the signal generation tree for the microinstruction as coded above. Comparing the signal generation tree to the microinstruction above, we note the following.
The design of the signal tree for the defer cycle is quite simple. Only two clock pulses of this cycle cause control signals to be emitted. Here is the design of the defer cycle. The Execute Cycle is discussed at some length in the textbook. There is no need to repeat everything in these notes, so the student is instructed to examine the textbook. Testing the State of the Accumulator We have two instructions. BIP and BIN that are conditional branches based on the status of the accumulator. At this point we show circuitry to generate the three status flags N , Z , and P. It is easier to generate N and Z directly and to generate P = N Z; if the accumulator is not negative or zero, it must be positive. The N flag is just a copy of the sign bit of the accumulator. It is easier to generate Z by first generating Z , which is 1 only if the accumulator is not zero. Then we generate Z = NOT ( (^) Z ) and P.
Similar logic is used to test the index registers, except that the question is only whether or not the specified index register is zero or not zero. Design of the ALU We now cover the design of the Arithmetic Logic Unit (ALU) of the ASC. The book presents a design on page 228. Here we present a slightly cleaner design based on dividing the ALU into three functional units – the TRA unit (for tra1 and tra2 ), the Adder, and the Shift Unit (for shl , shr , and comp ). The figure below shows the design of the TRA (Transfer) Unit of the ALU. The design is the same for each of the sixteen bits. Note the OR gate controlling the tristate buffer that feeds Bus 3 – if either tra1 = 1 or tra2 = 1 then the unit outputs on Bus 3, otherwise it does not and possibly another unit places output on the bus. The Transfer Unit