Adding Caches to a Pipelined Processor Model (CDA 4150 Lab 3), Lab Reports of Computer Architecture and Organization

A lab assignment for adding caches to a pipelined processor model in the computer systems engineering course at the university of central florida. Students are required to complete the incomplete cache model by modifying the mem.v file and implementing cache control logic. The lab covers cache lookup, stalling the processor, and interfacing to the memory system using provided pli routines.

Typology: Lab Reports

Pre 2010

Uploaded on 11/08/2009

koofers-user-zoy
koofers-user-zoy 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CDA 4150-Lab 3/1
CDA 4150 Project 3
Adding Caches to the Processor Model
DueThurssday,November4th,1:30pm
Computer Systems Laboratory, University of Central Florida Orlando, FL 32816.
1. Objective
The purpose of this lab is to add both a data cache (D$) and instruction cache (I$) to your pipelined
processor model. You will be given two Verilog models—a working pipelined model and a pipelined model
with support for caching. The latter model is not complete and it is your assignment to complete it. The
former model can be used at your discretion as a “gold standard”. It may be useful to instrument the
working version and use it to compare your cache version against when debugging. Neither version has a
multiply/divide unit. You may add those in as a part of the extra credit.
After adding the caches, there are 3 extra-credit sections in the lab. These are completely optional, and
are discussed at the end of this handout.
2. Lab Setup
Go to your home directory (type cd)
mkdir cda4150/lab3
cd !$
cp ~heinrich/cda4150/lab3.tgz .
Uncompress the file with tar -xzf lab3.tgz
You should now have a pipe/ and a pipe+cache/ directory
The pipe/ directory is the working MIPS pipelined model. It is a solution to lab2 (w/o a multiply/divide
unit). Please be sure you check out the solution and understand the material from lab2. The pipe+cache/
directory is the working directory for lab3 that contains the incomplete cache model.
3. Adding the Caches
The goal of this lab is to add an 8KB direct-mapped, 32B line, I$ and an 8KB direct-mapped, 32B line,
writeback D$ to the processor model in the pipe+cache/ directory. The caches have already been created
via a special PLI routine in an initial block in mips.v. A later section describes all the new PLI routines in
detail.
To implement a correct solution for this lab, you should need only to change the mem.v file. However,
do not be misled. The additions to this file can be difficult and numerous. In addition, although you do not
need to modify the other files in the processor, mips.v and cpu.v have changed significantly with respect to
the pipe/ model. These changes are intended to simplify the lab, and allow you to focus on implementing
the cache control in mem.v.
3.1. The Memory System
mips.v is heavily changed from the previous labs. The main difference is that the processor/memory
interface changes considerably when the processor has on-chip instruction and data caches. In particular,
there is no need to have separate address and data busses for instructions and data. Instead, there is just
Tuesday, Dec 1st, 11:59pm
cp ~hgao/cda4150/lab3.tgz .
pf3
pf4

Partial preview of the text

Download Adding Caches to a Pipelined Processor Model (CDA 4150 Lab 3) and more Lab Reports Computer Architecture and Organization in PDF only on Docsity!

CDA 4150 Project 3

Adding Caches to the Processor Model

Due Thurssday, November 4th, 1:30pm

Computer Systems Laboratory, University of Central Florida Orlando, FL 32816.

1. Objective

The purpose of this lab is to add both a data cache (D$) and instruction cache (I$) to your pipelined processor model. You will be given two Verilog models—a working pipelined model and a pipelined model with support for caching. The latter model is not complete and it is your assignment to complete it. The former model can be used at your discretion as a “gold standard”. It may be useful to instrument the working version and use it to compare your cache version against when debugging. Neither version has a multiply/divide unit. You may add those in as a part of the extra credit. After adding the caches, there are 3 extra-credit sections in the lab. These are completely optional, and are discussed at the end of this handout.

2. Lab Setup

  • Go to your home directory (type cd)
  • mkdir cda4150/lab
  • cd !$
  • cp ~heinrich/cda4150/lab3.tgz.
  • Uncompress the file with tar -xzf lab3.tgz
  • You should now have a pipe/ and a pipe+cache/ directory The pipe/ directory is the working MIPS pipelined model. It is a solution to lab2 (w/o a multiply/divide unit). Please be sure you check out the solution and understand the material from lab2. The pipe+cache/ directory is the working directory for lab3 that contains the incomplete cache model.

3. Adding the Caches

The goal of this lab is to add an 8KB direct-mapped, 32B line, I$ and an 8KB direct-mapped, 32B line, writeback D$ to the processor model in the pipe+cache/ directory. The caches have already been created via a special PLI routine in an initial block in mips.v. A later section describes all the new PLI routines in detail. To implement a correct solution for this lab, you should need only to change the mem.v file. However, do not be misled. The additions to this file can be difficult and numerous. In addition, although you do not need to modify the other files in the processor, mips.v and cpu.v have changed significantly with respect to the pipe/ model. These changes are intended to simplify the lab, and allow you to focus on implementing the cache control in mem.v.

3.1. The Memory System

mips.v is heavily changed from the previous labs. The main difference is that the processor/memory interface changes considerably when the processor has on-chip instruction and data caches. In particular, there is no need to have separate address and data busses for instructions and data. Instead, there is just

one Bus for data and one Addr bus for the miss address. The data bus is bi-directional (keyword inout in Verilog). The control signals Read and Write are the same as in previous models. There is one additional interface signal, Valid, that is set by the memory system when it is returning valid data on the Bus. If you examine the mips.v file and read the comments, you will see that these changes to the model have already been made! From the comments, it should also be clear what the processor model needs to do to interact properly with this memory system:

  • on an I$ miss, assert Read for one cycle, and drive Addr with the PC. The signals should appear on the bus one cycle after the I$ lookup determines there is a miss. Some amount of time later, the memory system will assert the Valid line and drive the Bus with the data for the entire I$ line (one word per cycle). The processor should inspect the Valid line on the posedge of the CLK.
  • on a D$ miss (load or store) where the line being replaced is not dirty, the control is identical to the I$ miss above except that the Addr bus is driven from the MAR not the PC.
  • on a D$ miss where the line being replaced is dirty, perform spill-before-fill. First assert Write for each cycle that you must writeback a word from the replaced line. Again, the bus transactions should begin one cycle after the D$ lookup determines there is a miss. At the time you assert Write you must drive Addr with the replacement word address, and Bus with the replacement data word. After the last write cycle, the D$ miss can proceed as in the 2nd bullet item above.

3.2. The Processor Flow Control

If you look at cpu.v, you will find that all of the flops have been grouped into a single always block. This makes it easier to stall the machine. There are two cases when the machine must stall. The first is on a decodeStall from lab2, and the second is on an I$ or a D$ stall. The pipeline in cpu.v handles this latter case by stalling whenever the signal pipeInhibit is asserted. This signal is an output of mem.v, but is not yet implemented. You must set this signal appropriately in mem.v for your design to work properly. There is another place in cpu.v where some gating is needed. The PC-logic in the working pipelined model flops the PC on the negedge of the clock unless decodeStall is asserted. When we add caches, we must add a condition to that statement as well. This condition is already added in cpu.v, and the PC is not flopped if decodeStall or a new signal called pcInhibit is asserted. This signal is an output of mem.v, but is not yet implemented. You must set this signal appropriately in mem.v for your design to work properly.

3.3. The Cache PLI Routines

The actual cache state, tag, and data storage arrays have already been implemented in C for you. Your task is to write the control logic in mem.v that handles the cache lookup, stalling the processor, and interfacing to the memory system. The Verilog needs read and write access to the state, tag, and data arrays of each of the caches. The following PLI routines are defined for your use:

  • $icache tag read(index, set)
  • $icache state read(index, set)
  • $icache data read(index, set, word offset)
  • $icache tag write(index, set, tag)
  • $icache state write(index, set, state)
  • $icache data write(index, set, word offset, data)
  • $dcache tag read(index, set)
  • $dcache state read(index, set)
  • $dcache data read(index, set, word offset)
  • $dcache tag write(index, set, tag)

that generates these instructions (or, an assembly language program). In your README file, indicate the name of your test program and include your test program in your lab submission.

  1. Extra Credit 3: Multiply/Divide (20%)

Since the multiply and divide instructions only write the hi and lo registers, you do not need to include the multiplier or divider in any bypassing logic. Your pipelined processor should ship multiplies and divides to the multiply/divide unit and keep operating the pipeline normally even though the multiply or divide may still be in progress. The only thing you need to do is ensure that any mfhi and mflo instruction stalls in the decode stage when the multiplier or divider is busy, and that you detect structural hazards on the multiplier and divider. For full credit on each part you must implement both the signed and unsigned versions of the operation (e.g. mult and multu, div and divu).

  1. Submitting Your Lab

The submission procedure is the same as the previous labs. If your lab does not work entirely, please be sure your README file explains what works and what does not. If you did any or all of the Extra Credit, remember that your README file should include the discussions mentioned in the Extra Credit sections above.