















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Computer Architecture; Subject: Engineering Computer Science; University: University of California - Davis; Term: Fall 1999;
Typology: Study notes
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















Norman Matloff
- updated October 31, University of California at Davis 3.1 Latches and Flip-Flops..................................... 9 3.2 Edge-Triggering......................................... 11 3.3 Example: A 2-Bit Ripple Counter............................... 11 3.4 Example: Tracking Counts Mod and Div 5........................... 12
4 Bus-Based Circuits 14
5 Example: Memory Chips and Systems 17
5.1 An SRAM Memory Chip.................................... 17 5.2 A Memory System....................................... 20 5.3 Memory Interleaving...................................... 21 5.4 DRAMs............................................. 21
6 Example: A Simple CPU 21
1 Overview
As you know, all information inside a computer is processed and stored as 0-1 bits. Here we will look at the basic building blocks used to manipulate this 0-1 information. 1
2 Combinational Logic
The term combinational logic refers to circuitry that transforms bits, as opposed to storing bits. For exam- ple, the ALU portion of a CPU transforms data, e.g. transforming two input word-sized bit strings into an output which is the sum of the two inputs.
2.1.1 AND Gates
A basic AND gate has two inputs and one output. 2 Let’s call the two inputs X and Y, and the output Z. Then Z = 1 if and only if X = 1 and Y = 1, hence the name “AND.”
The AND operation is represented in boolean equation settings by multiplication, i.e. we write (^1) Our course has no digital design prerequisite, and thus some of this material must be presented here. For those of you who do have this background, I ask for your patience; we will quickly move to other things. (^2) Versions with fan-in of more than two, i.e. having more than two inputs, exist too.
2 COMBINATIONAL LOGIC 2.1 A Few Basic Gates
2.1.3 NOT Gates
A NOT gate has one input X and one output Z, with the output being the logical negation of the input. In other words, an input of 1 produces an output of 0, and vice versa.
In boolean equations, a NOT operation is indicated by an overbar:
The standard symbol for a NOT gate is:
2.1.4 NAND Gates
Here there are two inputs X and Y, and one output Z. The term “NAND” stands for “not-and,” meaning that Z = 1 if the statement “X = 1 and Y = 1” is not true.
The boolean equation is
The standard symbol for a NAND gate is
Note that the little circle here means “not.”
Obviously, if on the day on which we shopped at the Gates ’R Us store they were out of NAND gates, we could synthesize a NAND by using an AND together with a NOT:
But of course, this would not be so desirable as using a real NAND. The synthesized version would probably have more transistors than the real one, and thus would be slower and take up more space on a chip, thus reducing the total number of gates we could put on the chip.
2 COMBINATIONAL LOGIC 2.2 Some MSI Combinational Components
2.1.5 NOR Gates
Again, inputs X and Y, output Z, with Z being equal to 1 if the statement “X = 1 or Y = 1” is not true.
The boolean equation is
The symbol for a NOR gate is:
Again, the same effect could be synthesized by leading the output of an OR into a NOT.
2.1.6 XOR Gates
Here we have inputs X and Y, output Z, with Z being equal to 1 if the statement “X = 1 or Y = 1 but not both” is true. The term used for this is “exclusive-or,” abbreviated to XOR.
The boolean equation is
The symbol for an XOR gate is:
Again, the same effect could be synthesized by using two NOT gates, two AND gates and an OR gate.
“MSI” stands for “medium-scale integration.” We are integrating a moderate number of gates to form some frequently used building blocks.
2 COMBINATIONAL LOGIC 2.3 Examples
The logic will look like this:
Cout
Sum
(The dark circles represent wire connections. If two lines cross in the picture but there is no dark circle at their intersection, then they do not touch each other.
2.3.2 Full Adder
A full adder has one more input than does a half adder. We will call this input Cin, for “carry in.” The reason we need this extra input is that we will be using a full adder as a building block to do multi-bit addition. For example, consider the following addition of two 3-bit numbers, 011 and 001:
Let’s refer to the bit positions as 2, 1 and 0, from left to right. The point then is that the addition at position 0 resulted in a carry into position 1 (shown in the picture), and that carry must be incorporated into the sum performed at position 1. That carry would be the Cin for position 1 (and the Cout for position 0).
We will not draw the logic, but here are the equations (remember, we are now back to a single bit, even though the logic will be used below as a building block for a multi-bit adder):
Let’s use the following (not standard) symbol for a full adder:
2 COMBINATIONAL LOGIC 2.4 Timing
Cin
Cout S
2.3.3 2-Bit Adder
We can put two full adders together to form a 2-bit adder, i.e. logic which will add together two 2-bit inputs, producing a 2-bit sum and a possible carry:
Here (A1,A0) forms the first 2-bit addend, and (B1,B0) forms the second. The sum is (S1,S0), and the carry (into bit position 3) is C. Note that a constant 0 is hardwired into the Cin input of the full adder on the right.
The delay of a typical gate is on the order of 10 nanoseconds (ns), i.e. 10 billionths of a second. This sounds extremely fast, almost beyond human imagination, but in view of the fact that computers perform tens of millions of operations per second, gate delays do add up into tangible amounts of time, and thus directly affect the overall speed of the machine. To get the fastest machine, digital logic must be optimized. In other
3 SEQUENTIAL LOGIC 3.1 Latches and Flip-Flops
change to 1.
ensures that the output of the upper NOR gate is 0 (regardless of what R is), so Q will indeed continue to
pulsed.
Flip-flops are like latches, except that they are clocked , so that they accept new input only at certain times. A clock is a crystal device that pulses at regular intervals, sending 1,0,1,0,1,.... For example, a 300 megahertz (mhz) PC has a clock which pulses 300 million times per second. 4
Flip-flops, by virtue of having clocked input acceptance, allow the designer much more convenient control. After all, in a complex machine, the inputs will sometimes be garbage (still 0 or 1, maybe, but not meaning- ful), and the clocked nature allows us to make sure that the stored values will change only when we want them to, i.e. when nongarbage values are at the inputs. Note that this also means that in many cases the clock pulse itself is not connected to the clock input of a flip-flop, but rather that pulse is AND-ed with some other wires that represent conditions under which the input data is valid. (You will see an example of this later in this tutorial, where we build a RAM circuit.)
A D flip-flop can be constructed as follows:
D
Clk
The D input (“data”) is the new value to be stored at the time of the clock pulse. You should “walk through” an instance of the operation of this circuit, again say where the value 1 had been stored originally (Q = 1), but in which we want the new value to be 0 (D = 0). Trace through the sequence of events which will occur
the upper-right NAND making Q change to 0, as desired. (^4) Of course, the faster the clock, the faster the machine. However, in choosing the clock speed, we have to account for all gate delays, signal propagation times along wires, and so on, so that all signals reach their destinations within one clock cycle. In other words, we must choose the clock cycle to accommodate the longest delay in the overall circuit.
3 SEQUENTIAL LOGIC 3.2 Edge-Triggering
Many flip-flops are edge-triggered. What this means is that they are designed in such a way that an input value (labeled D in the picture above) will have effect on the flip-flop only during a narrow window of time, specifically the time during which the clock pulse is rising or falling. 5
This is done to avoid feedback problems in complex circuits. The output of a flip-flop may be routed through a series of gates and ultimately fed back in to the same flip-flop as an input. For instance, consider the Intel assembly-language instruction
(If you have not worked with Intel machines before, this instruction adds the values in the AX and BX registers, and stores the sum back into AX.)
Suppose AX and BX originally contain the values 5 and 2, respectively. The new value in AX should be
Recall that an n-bit string can represent (unsigned) integers in the range 0, 1, 2, ...,
. An n-bit ripple counter is simply a counter, which will continually cycle through these values. For example, a 2-bit ripple counter will cycle through 0, 1, 2, 3, 0, 1, 2, 3, 0,..., or in bit form, 00, 01, 10, 11, 00, 01, 10, 11, 00, ...
We can construct such a counter from two D flip-flops, two half adders, and a clock: (^5) These are called the leading edge and falling edge of the clock pulse. (^6) Again, remember that registers are made up of flip-flops.
3 SEQUENTIAL LOGIC 3.4 Example: Tracking Counts Mod and Div 5
input
init
The input line is visible at the left of the picture. Its quiescent state is 0, but sometimes pulses, i.e. 1s, come in on that line. (The pulses might come at regular intervals, such as from a clock, or at irregular times, depending on what application we needed this circuit for.) The output pins are M0, M1 and M2, which contain c mod 5, and D0, D1, D2 and D3, which contain c div 5 (though only up to 15 for the latter). There are two boxes labeled CTR; these are 4-bit ripple counters. The pins labeled C in them are for clearing, i.e. resetting; if this pin is pulsed, all bits of the ripple counter will be reset to 0. Similarly, there is an “init” input at the top of the picture, to clear both the mod and div counters. The box labeled DCD is a 3-to-8 decoder. Its output pins are active low , meaning that 0 means “yes” and 1 means “no.” For example, let’s label these pins Z7,Z6,...,Z0 from left to right. Then Z7 will be equal to 0 if and only if the three input pins contain 111, the binary representation of 7. Whenever the mod counter reaches 4, Z4 will be 0, which means that the output of the NOT gate we’ve connected to Z4 will be 1. This 1 is then ANDed with the circuit’s input line. So, the next time the input line is pulsed—which will be a count which is a multiple of 5—a pulse will be felt at the c div 5 counter. That counter will then increment, exactly what we want. Note that the gate delay in DCD is helping us here. When the count is 4, the next pulse will change the count momentarily to 5 (and the rest of the circuit will change that to 0). However, even when the count first
becomes 5, Z4 will still be equal to 0 (indicating a count of 4, not 5) for a short period of time equal to the gate delay in DCD. This delay is good, because when the pulse comes at count 4, we want Z4 to stay at 0 long enough so that it makes CTR increment. This illustrates the delicate timing issues which can arise in digital circuits.
4 Bus-Based Circuits
A bus is a set of parallel wires used for transfer of data among various components. You are probably familiar with the idea of a system bus for a computer, which connects components such as the CPU, memory and I/O devices.^7 Since several components are attached to the same bus, how do we make sure that only one of them actually is connected to the bus at a time? The answer is that we use tri-state buffers which can connect a component to a bus, or electrically isolate the component away from the bus. To illustrate this, consider the design of a very simple CPU which, for simplicity of exposition, will have only two registers, R0 and R1, each only one bit wide, implemented as a DFF:
Clk
In each register, the upper left input is for data, the lower left for the clock, and output is out the right side. Again, keep in mind that the bus shown here is inside the CPU, different from the system bus; the purpose of the bus here is for data transfer from one register (the source ) to another (the destination ). (There also would be an ALU, etc., but we do not show other items here.) We see data (D) and address (AS, AD) lines here, just as we would for a system bus. However, due to the simple nature of our example, in which our word size is only one bit (!), we only need one data line; if we had say, 32-bit words, we would need 32 data lines, D31,D30,...,D0. Similarly, since we have only two registers, we only need one address line for the source register, and one for the destination register; if we had say, 16 registers, we would need four address lines each for source and destination. What is new here, though, is the presence of triangles which look like NOT gates but are instead tri-state buffers; the latter are distinguished from the former by the presence of an extra input line coming in at the (^7) In this context, the term bus often connotes not just the wires themselves, but also standards for the roles played by the wires, electrical characteristics, and so on.
Clk
We do not have room to show the connecting lines, gates and tri-state buffers; suffice it to say that they are similar to those shown earlier. (You should try drawing some for yourself, though, as a check of your understanding.) For simplicity, we are continuing to assume a one-bit word size. The ALU has two inputs on its left, and one output on its right. Again, remember that the ALU is a combinational circuit, for example performing addition, using properly-chosen gates as we have seen before. Note that we have added two new registers, R2 and R3. They are known as private registers , “private” in the sense that they will not be visible to the programmer. The instruction format for the machine would still have each register field (source and destination) within an instruction consist of only one bit, which would control whether R0 or R1 is involved; the programmer has no way to specify R2 or R3 here. Instead, R2 and R3 will serve as temporary storage cells which save up data destined for the ALU. As you can see, their Q outputs feed into the ALU, rather than being connected back to the D bus line as with R and R1. Now consider a machine instruction ADD R0,R1, meaning that the old value of R0+R1 will now become the new value of R0. This instruction would require three clock cycles:
first clock: copy R0 to R second clock: copy R1 to R third clock: enable the tri-state buffer connecting the output of the ALU to D, and let D flow into R
There are also inputs, again not shown, to the ALU determining which operation we want to perform, e.g. add, subtract, logical-and, logical-or, etc. Thus in the third clock cycle the control logic would also be putting the code for “add” on the control inputs to the ALU. 8 During the third cycle we would also enable the connection from the output of the ALU to the bus. (^8) Though we have not set it up this way here, in most designs even a “move” operation is done through the ALU; in the case here, then, during the first and second clock cycles the control logic would put the code for “move” onto the ALU’s control input lines.
By the way, you can now begin to see how the digital logic design affects the choice of clock speed. The clock cycle length must be chosen to cover the worst case that would occur. Here that means, for instance, that the cycle must be at least as long as the time needed to do an addition, and to copy two values between a register and a bus.
How could we make this faster? Suppose we had two buses, rather than one, with all components (the registers and the ALU) connected to both buses. Then we could load both R2 and R3 at the same time, i.e. during the same clock cycle: R0 would be copied to R2 via the first bus, while at the same time, R1 would be copied to R2 via the second bus. During the second cycle we would enable the output from the ALU to a bus, say the first one, and enable the input to (in this example) R0. The instruction ADD R0,R1 would then take only two clock cycles, rather than three—a 33% speedup!
And by adding a third bus, we could get the time down to only one cycle. Now all of this is somewhat oversimplified (we have not accounted for time needed to fetch the instruction from memory and decode it, etc.), but you can at least begin to see the principle here—a classic computer science time/space tradeoff. If we are willing to use greater amounts of precious space on the CPU chip (more buses take up more room), we can reap a savings in time.
With this example, the need for edge-triggering or other anti-feedback mechanism should be more apparent. It would be worthwhile for you now to go back and review that section earlier in this document.
To illustrate the principles developed here, we will consider the design of simple memory chips and memory systems.
First consider the design of a “4x2” memory chip, meaning that it contains four two-bit words. (This is much, much smaller than the sizes of typical commercial memory chips, but the principles are the same.) We will call the four words Word 0, Word 1, Word 2 and Word 3. (Remember, though, that these are word numbers within this chip, not within a system constructed from this chip and others; more on this later.)
The chip will have the following pins: address pins A1 and A0 (two pins encode
addresses), which indicate which of the four words is to be accessed; data-in pins DI1 and DI0 (for writing data to the chip); data-out pins DO1 and DO0 (for reading data from the chip); a write-enable pin WE, used to inform the chip that we wish to write to it; an output-enable pin OE, used to inform the chip that we wish to read from it; and a chip-select pin CS, to inform the chip that it, rather than some other memory chip, will be involved in the current memory transaction.
Again, the “4x2” designation for this chip means that the chip contains four words, each two bits wide. Note that this might be a quite different viewpoint than that held by the CPU of our system. The CPU might, say, view an “8x4” system would consist of eight four-bit words. The memory for such a system could then be constructed by using in combination four 4x2 chips, as we will do later. (The CPU, though, would be unaware of the chip structure of the system.)
The use of separate pins for data input and output here is not standard, but simplifies the design somewhat.
5 EXAMPLE: MEMORY CHIPS AND SYSTEMS 5.1 An SRAM Memory Chip
dcdr
As mentioned earlier, the main part of the chip consists of a 4x2 array of the bit cells designed above. The top row is Word 0 of the chip, the row just below it is Word 1, and so on. A 2-to-4 decoder at the middle left of the diagram selects the proper row of bit cells, i.e. the proper word, depending on which address is desired.
The design is straightforward. Let us confirm, for instance, that if CS is 0, no data will be allowed into or out of this chip. First, note that if CS is 0, then the inputs to WR in the bit cells will also be 0; thus no data will be allowed into any bit cell, exactly as planned. Second, note that if CS is 0, the inputs to the two tri-state devices near the DO1 and DO0 pins will also be 0, insuring that no data flows out of the chip. Again, all of this is important; we will be connecting several of these chips to a system bus, and must insure that we do not have data from two chips flowing onto the same bus lines at the same time.
Consistent with our earlier comment on the 1-bit cell design, the WE input here would likely be set up as the AND-ing together of, say, W and CLOCK lines in the system bus. The same comment applies to our memory system in the following subsystem (even though the CLOCK line is not drawn).
5 EXAMPLE: MEMORY CHIPS AND SYSTEMS 5.2 A Memory System
Now, let us see how a memory system can be constructed from memory chips. Again for simplicity, we will continue to assume 4x2 chips,^9 and we will assume an overall system of eight four-bit words. 10 Here is how we can construct the system from 4x2 chips:
The bus here is a system bus like you learned in ECS 50 (or equivalent course), not the buses internal to CPUs and memory chips which we have discussed earlier. The CPU and I/O devices are also connected to this bus, but are not shown here, nor is the CLOCK line shown. We are assuming that no direct memory-to- memory access is possible; all reads from and writes to memory are performed by the CPU.
Let’s call the four chips I, II, III and IV, from left to right. By inspecting which chip pins are connected to bus lines D3-D0, you can see that chips I and III contain the lower two bits of each four-bit word, while chips II and IV contain the higher two bits. By noting the connections of bus line A2 to the CS pins of the chips, you can see that of the entire address space 0-7, now speaking from the CPU/system viewpoint , chips I and II contain system words 0-3, and chips III and IV contain system words 4-7.
Suppose, for example, we have a C program with an int variable x, and that the address of x happens to be 3 in this system. Then x will be stored in chips I (lower 2 bits) and II (upper 2 bits). If the C source code has something like
x = 5;
then on, say, an Intel CPU the compiler will produce code like (^9) For even more simplicity, we now assume that each data pin does both input and output. (^10) Remember, the CPU will see only the latter, and not know how the system breaks down in terms of chips.