A Simple Computer Hardware Design - Lecture Notes | CPSC 5155U, Study notes of Computer Architecture and Organization

Material Type: Notes; Professor: Bosworth; Class: Computer Architecture; Subject: Computer Science; University: Columbus State University; Term: Fall 2009;

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-vxa
koofers-user-vxa 🇺🇸

8 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A Simple Computer
Hardware Design
We now focus on the detailed design of the CPU (Central Processing Unit) of the ASC. The
CPU has two major components: the Control Unit and the ALU (Arithmetic Logic Unit).
We first consider the control unit.
Program Execution
The program execution cycle is the basic Fetch / Execute cycle in which the 16-bit
instruction is fetched from the memory and executed. This cycle is based on two registers:
PC the Program Counter – a 16-bit address register
IR the Instruction Register – a 16-bit data register.
At the beginning of the instruction fetch cycle the PC contains the address of the instruction
to be executed next. The fetch cycle begins by reading the memory at the address indicated
by the PC and copying the memory into the IR. At this point, the PC is incremented by 1 to
point to the next instruction. This is done due to the high probability that the instruction to
be executed next is the instruction in the address that follows immediately; program jumps
(BRU, BIP, etc.) are somewhat unusual and are handled in program execution.
All instructions share a common beginning to the fetch sequence. The common fetch
sequence is adapted to the relative speed of the CPU and memory. We assume that the
access time of the memory unit is such that the memory contents are not available on the step
following the memory read, but on the step after that. Here is the common fetch sequence.
MAR PC send the address of the instruction to the memory
Read Memory this causes MBR MAR[PC]
PC PC + 1 cannot access memory, so might as well increment the PC
IR MBR now the instruction is in the Instruction Register.
When the instruction is in the IR, it is decoded and the common fetch sequence terminates.
After this point, the execution sequence is specific to the instruction. This subsequent
execution sequence includes calculation of the EA (Effective Address) for those instructions
that take an operand.
The next step in the design of the CPU is to specify the microoperations corresponding to
the steps that must be executed in order for each of the assembly language instructions to be
executed. Before considering these microoperations, we study several topics.
the structure of the bus or busses internal to the CPU
the functional requirements on the ALU
We first consider the structure of the CPU internal busses.
Version of 11/28/2020 Page 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download A Simple Computer Hardware Design - Lecture Notes | CPSC 5155U and more Study notes Computer Architecture and Organization in PDF only on Docsity!

A Simple Computer

Hardware Design

We now focus on the detailed design of the CPU ( C entral P rocessing U nit) of the ASC. The CPU has two major components: the Control Unit and the ALU ( A rithmetic L ogic U nit). We first consider the control unit. Program Execution The program execution cycle is the basic Fetch / Execute cycle in which the 16-bit instruction is fetched from the memory and executed. This cycle is based on two registers: PC the Program Counter – a 16-bit address register IR the Instruction Register – a 16-bit data register. At the beginning of the instruction fetch cycle the PC contains the address of the instruction to be executed next. The fetch cycle begins by reading the memory at the address indicated by the PC and copying the memory into the IR. At this point, the PC is incremented by 1 to point to the next instruction. This is done due to the high probability that the instruction to be executed next is the instruction in the address that follows immediately; program jumps (BRU, BIP, etc.) are somewhat unusual and are handled in program execution. All instructions share a common beginning to the fetch sequence. The common fetch sequence is adapted to the relative speed of the CPU and memory. We assume that the access time of the memory unit is such that the memory contents are not available on the step following the memory read, but on the step after that. Here is the common fetch sequence. MAR  PC send the address of the instruction to the memory Read Memory this causes MBR  MAR[PC] PC  PC + 1 cannot access memory, so might as well increment the PC IR  MBR now the instruction is in the Instruction Register. When the instruction is in the IR, it is decoded and the common fetch sequence terminates. After this point, the execution sequence is specific to the instruction. This subsequent execution sequence includes calculation of the EA (Effective Address) for those instructions that take an operand. The next step in the design of the CPU is to specify the microoperations corresponding to the steps that must be executed in order for each of the assembly language instructions to be executed. Before considering these microoperations, we study several topics. the structure of the bus or busses internal to the CPU the functional requirements on the ALU We first consider the structure of the CPU internal busses.

CPU Internal Bus Structure We first consider the bus structure of the computer. Note that the computer has a number of busses at several levels. For example, there is a bus that connects the CPU to the memory unit and a bus that connects the CPU to the I/O devices. In addition to these important busses, there are often busses internal to the CPU, of which the programmer is usually unaware. We now consider the bus structure in light of the common fetch sequence. PCPC + 1 This microoperation represents the incrementing of the PC to point to the next instruction on the probability that the next instruction will be the next to be executed. Note that this one microoperation places a functional requirement on the ALU – it must implement an addition operation. We shall depart from the book’s notation and use the notation add to denote the ALU addition operation (and the control signal that causes that ALU operation) and the all uppercase ADD to denote the assembly language operation. At this point, we know that there must be at least one bus internal to the CPU so that the contents of the PC can be transferred to the ALU and the incremented value copied back to the PC. We consider a one bus solution and immediately notice a problem. The ALU must have two inputs for the add operation, one for the value of the PC and one for the value 1 used to increment the PC. If we use a single bus solution, we must allow for the fact that only one value at a time may be placed on the bus. We now present a design based on the single bus assumption. One design would add the complexity of an increment instruction for the ALU, but we avoid that and base our solution on the add operation only. We need a source of the constant 1, so we create a “1 register” to hold the number. We postulate a two input ALU with a register Z to hold the output. Since the bus can have only one value at a time, we must have a temporary register Y to hold one of the two inputs to the ALU. Here are the microoperations. CP1: 1  BUS, BUS  Y CP2: PC  BUS, add CP3: Z  BUS, BUS  PC We note that the single bus solution is rather slow. We would like another way to do this, preferably a faster one. The solution proposed by the book is to have three busses in the CPU, named BUS1, BUS2, and BUS3. With three busses, we can put one value on each of two busses that serve as input to the ALU and copy the results on the third bus, serving as input to the PC, as follows PC  BUS1, 1  BUS2, add , BUS3  PC

We now consider the implication of the microoperation MAR  PC. We have noted that the PC outputs to BUS1 and that BUS3 is used to transfer data to all registers. The only way for data to get to BUS3 from BUS1 is via the ALU, so we need an appropriate ALU primitive. We define the two ALU primitives for data transfer tra1 transfer the contents of BUS1 to BUS tra2 transfer the contents of BUS2 to BUS3. While we are at it, we assign the MAR to BUS1. Again, at this point, this is a bit arbitrary. We now examine the last microoperation IR  MBR. We assign the MBR to BUS2, thus requiring the tra2 primitive, already defined. At this point, we review what we have discovered from these four microoperations by converting them to control signals. MAR  PC PC  BUS1, tra1 , BUS3  MAR Read Memory Read Memory PC  PC + 1 PC  BUS1, 1  BUS2, add , BUS3  PC IR  MBR MBR  BUS2, tra2 , BUS3  IR We now look at the CPU design as it has evolved at this time in response to the requirements to support the Common Fetch sequence. Please remember that this is not a complete description, but only a description of what we have discussed to this point. The ALU output bus, BUS3, feeds all of the registers with the exception of the constant register +1. We have a somewhat uneven division of the ALU input busses, but this will be fixed later. Two Addressing Modes: Direct and Indexed We shall soon consider all four addressing modes. For now, we consider the impact of two of the addressing modes on the CPU design. Recall that the address part of the instruction occupies the lower 8 bits: bits 7 through 0 inclusive. When a 1-operand instruction is copied into the IR (Instruction Register), the address part is IR7–0. In direct addressing, this is the address to use. In indexed addressing, the address to use is IR7–0 + IX, where IX denotes the contents of the specified address register. These addresses go to the MAR, thus Direct Addressing MAR  IR7– Indexed Addressing MAR  IR7–0 + IX

The net result of these two addressing modes is that we need to connect the three index registers (IX1, IX2, and IX3) to BUS2 so that the contents of the indicated Index Register can be added to the lower order 8 bits of the IR. At this point, we mention the IX0 trick – actually a standard design practice. Consider the above description of the two addressing modes, slightly rewritten. IR 9 IR 8 = 00 MAR  IR7–0 + 0 IR 9 IR 8 = 01 MAR  IR7–0 + IX IR 9 IR 8 = 10 MAR  IR7–0 + IX IR 9 IR 8 = 11 MAR  IR7–0 + IX The trick is to define register IX0 as a constant register containing the constant 0. With this constraint, the microoperation MAR  IR7–0 + IX0 is the same as MAR  IR7–0 + 0. The advantage of this trick is that the control unit is simplified – always a good thing. We now show the state of the CPU design as reflecting the latest requirements. Note the top index register (IX0) is shown filled with 0’s to indicate that it is really the constant 0. Since IX0 is a constant register, it has no connection to BUS3, the only function of which is to put data into registers. We should not leave this without making a brief statement on how the index registers are connected to the busses so that we can use the control signals IX  BUS2 and BUS3  IX. The index registers are connected to BUS through a 4-to-1 MUX. The control signal IX  BUS2 causes the output of this MUX to be placed on BUS2. BUS3 is connected to the three writable index registers through a 1-to- DEMUX operating in the expected fashion when the control signal BUS3  IX is active.

ADD The key micro-operation to be executed is ACC + MBR  ACC. This immediately requires that the ACC feed a bus different from the one used by the MBR. Since the MBR has been assigned to BUS2, the ACC must be assigned to BUS1 (as done above). ALU primitives add (already specified) TCA The key micro-operation to be executed is – ACC  ACC. We need a new ALU primitive; we have the choice tca or comp. Implementation of the two’s-complement as an ALU primitive tca would yield a speed boost but create problems in the ALU design, so we opt to have the ALU primitive be comp – for one’s-complement. This will cause the TCA instruction to be executed as a two-step operation: store the one’s-complement in the ACC and then add 1 to it. We have a +1 register attached to BUS2 and an add primitive defined. ALU primitives comp SHL This requires the ALU primitive shl. SHR This requires the ALU primitive shr. TIX As the index registers feed bus 2, this requires a +1 constant register to feed bus 1. TDX As the index registers feed bus 2, this requires a –1 constant register to feed bus 1. Indexed Addressing Revisited We now address the impact of the IX0 design choice. Recall the claim that this design has reduced the number of addressing modes to two: Indexed and Indexed-Indirect, so that Direct Addressing becomes Indexed by IX0, identically 0. Indirect Addressing becomes Indexed by IX0, then indirect Indexed Addressing becomes Indexed by a real index register Indexed-Indirect Addressing becomes Indexed by a real index register, then indirect We must now face the problem that execution of op-codes C, D, E, and F does not allow for indexed addressing. So we need a new criterion for choosing when to index (possibly with IX0) and when not to index. The answer is seen in the structure of the op-codes. If IR15-14 = 10 addressing is not used, so there is no indexed addressing If IR15-14 = 11 indexed addressing is not used. The design choice is to inhibit indexing if IR 15 = 1. If IR 15 = 0, the design uses indexed addressing, possibly with IX0 – leading to direct or indirect addressing. If IR 15 = 1, only direct or indirect addressing are possible. This approach simplifies the design. We now specify a design requirement for the common fetch cycle: an address is to be deposited in the MAR. This address will be the Effective Address unless indirect addressing is also used. Thus, we specify a common fetch sequence for all one-operand instructions.

Common Fetch Sequence for One-Operand Instructions CP1: PC  BUS1, tra1 , BUS3  MAR, READ. CP2: PC  BUS1, 1  BUS2, add , BUS3  PC CP3: MBR  BUS2, tra2 , BUS3  IR CP4: If IR 15 = 0 Then IR7-0  BUS1, IX  BUS2, add , BUS3  MAR Else IR7-0  BUS1, tra1 , BUS3  MAR. The zero-operand instructions will replace the CP4 step with an appropriate action. The Defer Cycle The defer cycle handles indirect addressing. If indirect addressing is used, then on entry to the defer cycle the value in the MAR is the address of a pointer to the argument. On exit, the argument address must be in the MAR. Here is the complete defer sequence. CP1: READ The address is in the MAR CP2: Wait Cannot do anything here CP3: MBR  BUS2, tra2 , BUS3  MAR Now the argument address in the MAR CP4: Wait Major Cycles and Minor Cycles We now introduce the concept of major and minor cycles. In the first CPU design, we focus on three major cycles: Fetch, Defer, and Execute. In this design, each major cycle has four minor cycles. In another design, the Fetch, Defer, and Execute cycles are seen as phases in the execution of an instruction. The minor cycle is tied to the CPU clock rate. A minor cycle can contain only micro- operations that can be executed at the same time and complete in one clock time. Suppose that the ASC is a 10 MHz machine. The clock time is then 0.1 microsecond; this is the time duration for a minor cycle. Our design calls for the ASC to wait for one clock cycle between asserting an address in the MAR and retrieving data from the MBR; this corresponds to a memory access time of 100 nanoseconds – a reasonable value. We now examine two implementations of the control unit, one using only standard combinational circuits (“hardwired” control unit) and then a micro-programmed CPU. The basis for each of these designs is the complete sequence of control signals for each assembly language instruction. For convenience, we give the control signals within the context of a hardwired control unit. Note that the defer cycle is always listed for one-operand instructions even when indirect addressing is not used. If indirect addressing is not used, the defer cycle is not entered. No indirect addressing Fetch – Execute Indirect addressing used Fetch – Defer – Execute

4 TCA (Negate the Accumulator using the Two’s-Complement) F,CP1: PC  BUS1, tra1 , BUS3  MAR, READ. F,CP2: PC  BUS1, 1  BUS2, add , BUS3  PC F,CP3: MBR  BUS2, tra2 , BUS3  IR F,CP4: Wait Defer Cycle is not entered. E,CP1: ACC  BUS1, comp , BUS3  ACC E,CP2: ACC  BUS1, 1  BUS2, add , BUS3  ACC E,CP3: Wait E,CP4: Wait 5 BRU (Unconditional Branch) Fetch – as in LDA Defer – if entered, the same as LDA E,CP1: MAR  BUS1, tra1 , BUS3  PC E,CP2: Wait E,CP3: Wait E,CP4: Wait 6 BIP (Branch if the Accumulator is Positive) Fetch – as in LDA Defer – if entered, the same as LDA E,CP1: If ACC > 0 Then MAR  BUS1, tra1 , BUS3  PC E,CP2: Wait E,CP3: Wait E,CP4: Wait 7 BIN (Branch if the Accumulator is Negative) Fetch – as in LDA Defer – if entered, the same as LDA E,CP1: If ACC < 0 Then MAR  BUS1, tra1 , BUS3  PC E,CP2: Wait E,CP3: Wait E,CP4: Wait

Before giving the control signals for the RWD and WWD assembly language instructions, we first must define the “handshake” protocols used in the I/O system. The I/O system uses three flip-flops: DATA, INPUT, and OUTPUT. The uses of these flip-flops are as follows: DATA indicates the presence of data to be transferred INPUT commands the input data to place data on the DIL (data input lines) OUTPUT commands the output device to take data from the DOL (data output lines) The RWD protocol is as follows

  1. The CPU resets the DATA flip-flop.
  2. The CPU sets the INPUT flip-flop
  3. The CPU waits for the DATA flip-flop to be set by the Input Unit
  4. The Input Unit places data on the DIL and sets the DATA flip-flop
  5. The CPU transfers data from the DIL to the ACC and resets the INPUT flip-flop. The WWD protocol is as follows
  6. The CPU resets the DATA flip-flop
  7. The CPU transfers data from the ACC to the DOL and sets the OUTPUT flip-flop
  8. The CPU waits for the DATA flip-flop to be set by the Output Unit.
  9. The output unit copies data from the DOL and sets the DATA flip-flop
  10. The CPU resets the OUTPUT flip-flop 8 RWD (Read Word from Input Unit into the Accumulator) F,CP1: PC  BUS1, tra1 , BUS3  MAR, READ. F,CP2: PC  BUS1, 1  BUS2, add , BUS3  PC F,CP3: MBR  BUS2, tra2 , BUS3  IR F,CP4: 0  DATA, 1  INPUT E,CP1: Wait Stay in Execute until the Input is complete. E,CP2: Wait E,CP3: Wait E,CP4: If DATA = 1 Then DIL  ACC, 0  INPUT Else E  Next State 9 WWD (Write Word to Output Unit from the Accumulator) F,CP1: PC  BUS1, tra1 , BUS3  MAR, READ. F,CP2: PC  BUS1, 1  BUS2, add , BUS3  PC F,CP3: MBR  BUS2, tra2 , BUS3  IR F,CP4: 0  DATA, 1  OUTPUT, ACC  DOL E,CP1: Wait Stay in Execute until the Output is complete E,CP2: Wait E,CP3: Wait E,CP4: If DATA = 1 Then 0  OUPUT Else E  Next State

The next two instructions levy new requirements on the CPU. The index registers are connected to BUS2. We need to increment and decrement the registers using the ALU, so we need two additional registers, +1 and –1, both connected to BUS1. E TIX (Increment and Test the Accumulator, Branch Conditionally) Fetch – same as LDX Defer – same as LDX E,CP1: +1  BUS1, IX  BUS2, add , BUS3  IX E,CP2: If IX = 0 Then MAR  BUS1, tra1 , BUS3  PC E,CP3: Wait E,CP4: Wait F TDX (Decrement and Test the Accumulator, Branch Conditionally) Fetch – same as LDX Defer – same as LDX E,CP1: –1  BUS1, IX  BUS2, add , BUS3  IX E,CP2: If IX  0 Then MAR  BUS1, tra1 , BUS3  PC E,CP3: Wait E,CP4: Wait Design of a Hardwired Control Unit We now address the issues associated with the design of a hardwired control unit. We shall study the following issues:

  1. Decoding the IR (Instruction Register)
  2. The major state register
  3. The control signal generation tree The hardwired control unit has a very regular structure. It is based on two “clocks” – one generating minor states and one generating the three major states: Fetch, Defer, and Execute. Each of the major states, if entered, takes four minor states or clock pulses: CP1, CP2, CP3, and CP4. The minor clock pulses are generated by a modulo-4 counter (see text, page 102). The system reset signal is asserted when the computer is started. When the computer is running, this generates a continuous stream of clock pulses, each defining a minor state.

The instruction decoder takes the four high order bits of the IR (Instruction Register). The output of the 4-to-16 decoder is one of 16 control signals, each identified by the name of the assembly language instruction with which the control signal is associated. For simplicity we do not consider the extended instruction set as does the figure on page 238. For example, consider the STX instruction. The Opcode for this instruction is IR15-12 = 1101. This is input to the decoder, causing the output 13 to become active and assert the signal STX, which causes the logic for the STX assembly language instruction to be executed. The Indirect control signal is the output of IR 10 , bit 10 of the Instruction Register. We have already discussed the use of IR9-8 for index register selection logic. Design of the Major State Register In order to design the major state register we first must note the sequence of major states corresponding to the execution of the assembly language instructions. We have seen that there are three possible sequences Fetch only Fetch – Execute Fetch – Defer – Execute The major state register is a sequential circuit that produces the sequence of major states. For example, if the current state is Fetch, what is to be the next major state. If the instruction can execute completely in the Fetch state, the next state will also be Fetch as the next instruction is fetched and its execution begun. If the instruction cannot be completed in the Fetch state (as most cannot), the question is whether or not to enter the Defer state. The next consideration arises from the two I/O instructions RWD and WWD. The strategy will be to hold these instructions in Execute until the I/O transaction is complete. To facilitate this and the previous considerations in the design of the Major State Register, we define three signals I 1 if in Fetch, stay in Fetch (and fetch the next instruction) I 2 if in Fetch, go to Defer (and handle indirect addressing) I 3 if in Execute, stay in Execute until the I/O is complete.

Signal Generation Tree We now have the three major parts of circuits required to generate the control signals.

  1. the major state register (F, D, and E)
  2. the minor state register (CP1, CP2, CP3, CP4)
  3. the instruction decoder In hardwired control units, these and some other condition signals are used as input to combinational circuits for generation of control signals. As an example, we consider the generation of the control signals for the first three steps of the fetch phase. When the major cycle is Fetch, then the discrete signal Fetch = 1 (True). The discrete signals CP1, CP2, and CP3 each are 1 at the appropriate clock pulse. Thus in CP1 of the Fetch cycle, we have Fetch = 1, CP1 = 1, CP2 = 0, and CP3 = 0. Only the top AND gate has both inputs 1, so its output is 1 and, as a result, the signals PC to BUS1, tra1, BUS3 to MAR, and READ are all asserted (set to 1). In Fetch, CP only the second AND gate has output of 1 and the second set of signals is asserted. F, CP3 is similar. There is one obvious remark about the above drawing. Notice that the each of the top two AND gates generates a signal labeled “PC To BUS1”. At some point in the design, these and any other identical signals are all input into an OR gate used to effect the actual transfer. The book contains the complete set of signal generation trees for the ASC. Although they may seem complicated, they are really quite simple. Figure 5.15 on page 253 shows the tree for the Defer cycle, which has the advantage of also appearing simple. The simplest way to handle the other trees in to consider five trees, one each for Fetch-CP4, Execute-CP1, Execute-CP2, Execute-CP3, and Execute-CP4. The figures in the book on pages 252 to 256 illustrate these. We now look at the signal tree for Fetch, CP4. This we do in two stages. First we consider what the signal generation tree would look like, were the generic microinstruction for F, CP4 followed. Then we shall look at the actual signal generation tree. Fetch CP CP CP BUS3 To MAR PC To BUS tra READ tra MBR To BUS BUS3 To IR 1 To BUS add BUS3 To PC PC To BUS

The microinstruction for the generic Fetch, CP4 is as follows. F, CP4: If IR 15 = 0 Then IR7-0  BUS1, IX  BUS2, add , BUS3  MAR Else IR7-0  BUS1, tra1 , BUS3  MAR. We shall use this form of the instruction primarily as an illustration of the handling of IF THEN ELSE conditions in a signal control tree. After this has been done, we display the signal generation tree for Fetch, CP4 as actually found on the ASC. Here is the signal generation tree for the microinstruction as coded above. Comparing the signal generation tree to the microinstruction above, we note the following.

  1. The microinstruction is for Fetch, CP4. The signals in this tree can be asserted only if Fetch = 1 and CP4 = 1, as they are based on output from an AND gate with input Fetch and CP4.
  2. The signals IR7-0  BUS1 and BUS3  MAR are common to both variants of the Fetch, CP4 signals as they do not depend on the state of IR 15.
  3. If IR 15 = 1, we issue the signals IX  BUS2 and add , otherwise just the signal tra. The student should note how a pair of AND gates and a single NOT gate are used to create the signals corresponding to the IF THEN ELSE condition. We now begin our examination of the real signal generation tree for Fetch, CP4.

The design of the signal tree for the defer cycle is quite simple. Only two clock pulses of this cycle cause control signals to be emitted. Here is the design of the defer cycle. The Execute Cycle is discussed at some length in the textbook. There is no need to repeat everything in these notes, so the student is instructed to examine the textbook. Testing the State of the Accumulator We have two instructions. BIP and BIN that are conditional branches based on the status of the accumulator. At this point we show circuitry to generate the three status flags N , Z , and P. It is easier to generate N and Z directly and to generate P = N  Z; if the accumulator is not negative or zero, it must be positive. The N flag is just a copy of the sign bit of the accumulator. It is easier to generate Z by first generating Z , which is 1 only if the accumulator is not zero. Then we generate Z = NOT ( (^) Z ) and P.

Similar logic is used to test the index registers, except that the question is only whether or not the specified index register is zero or not zero. Design of the ALU We now cover the design of the Arithmetic Logic Unit (ALU) of the ASC. The book presents a design on page 228. Here we present a slightly cleaner design based on dividing the ALU into three functional units – the TRA unit (for tra1 and tra2 ), the Adder, and the Shift Unit (for shl , shr , and comp ). The figure below shows the design of the TRA (Transfer) Unit of the ALU. The design is the same for each of the sixteen bits. Note the OR gate controlling the tristate buffer that feeds Bus 3 – if either tra1 = 1 or tra2 = 1 then the unit outputs on Bus 3, otherwise it does not and possibly another unit places output on the bus. The Transfer Unit