Download Computer Organization - Project 1 | ENEE 350 and more Study Guides, Projects, Research Computer Architecture and Organization in PDF only on Docsity!
1. Purpose
This project is intended to help you understand the instructions of a very simple assembly lan-
guage and how to assemble programs into machine-level code.
2. Problem
This project has three parts. In the first part, you will write a program to take an assembly-lan-
guage program and produce the corresponding machine code. In the second part, you will write a
behavioral simulator for arbitrary machine code. In the third part, you will demonstrate your
assembler and simulator by writing a short assembly-language program to multiply two numbers.
3. RiSC Level-0 Instruction-Set Architecture
For the first several projects, you will be gradually “building” the RiSC Level-0 (Ridiculously
Simple Computer). The RiSC is very simple, but it is general enough to solve complex problems.
For this project, you will only need to know the instruction set and instruction format of the RiSC.
The RiSC is an 8-register, 16-bit computer. All addresses are shortword-addresses (i.e. address 0
corresponds to the first two bytes of main memory, address 1 corresponds to the second two bytes
of main memory, etc.). Like the MIPS instruction-set architecture, by hardware convention, regis-
ter 0 will always contain the value 0. The machine must enforce this.
There are 4 machine-code instruction formats: R-type, I-type, J-type, and O-type.
opcode reg A reg B 0 reg C
3 bits 3 bits 3 bits 4 bits 3 bits R-type:
opcode reg A reg B signed immediate (-64 to 63)
3 bits 3 bits 3 bits 7 bits I-type:
opcode reg A reg B 0
3 bits 3 bits 3 bits 7 bits J-type:
opcode 0
3 bits 13 bits O-type:
Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Project 1 (10%): RiSC Level-
ENEE 350: Computer Organization, Fall 1999
Assigned: Thursday, September 9; Due: Tuesday, September 28
- R-type instructions (add, nand): bits 15-13: opcode
bits 12-10: reg A
bits 9-7: reg B
bits 6-3: unused (should all be 0)
bits 2-0: reg C
- I-type instructions (lw, sw, beq, addi): bits 15-13: opcode
bits 12-10: reg A
bits 9-7: reg B
bits 6-0: immediate (a 7-bit, 2’s complement number
with a range of -64 to 63)
- J-type instructions (jalr): bits 15-13: opcode
bits 12-10: reg A
bits 9-7: reg B
bits 6-0: unused (should all be 0)
- O-type instructions (halt): bits 15-13: opcode
bits 12-0: unused (should all be 0)
The following table describes the different opcodes.
Mnemonic
Name and
Format
Opcode
(binary)
Assembly
Format
Action
add
Add
R-type
000 add rA, rB, rC
Add contents of regB with regC ,
store result in regA.
addi
Add Immediate
I-type
001 addi rA, rB, imm
Add contents of regB with imm ,
store result in regA.
nand
Nand
R-type
010 nand rA, rB, rC
Nand contents of regB with regC ,
store results in regA.
halt
Halt
O-type
011 halt
Halt the machine; do nothing and
announce that the machine has stopped.
lw
Load word
I-type
100 lw rA, rB, imm
Load value from memory into regA.
Memory address is formed by adding
immediate with contents of regB.
sw
Store word
I-type
101 sw rA, rB, imm
Store value from regA into memory.
Memory address is formed by adding
immediate with contents of regB.
beq
Branch if equal
I-type
110 beq rA, rB, imm
If the contents of regA and regB are the
same, branch to the address
PC+1+ immediate , where PC is the
address of the beq instruction.
jalr
Jump-and-link-
register J-type
111 jalr rA, rB
Branch to the address in regB.
Store PC+1 into regA , where PC is the
address of the jalr instruction.
- Note that the OLD value of PC+1 should be stored into regA , not the NEW value of PC+1 (i.e. the value of PC after it has been given the value in regB ). This can be tricky if regA is the same as regB.
Anything after a pound sign (‘#’) is considered a comment and is ignored. The comment field ends
at the end of the line. Comments are vital to creating understandable assembly-language pro-
grams, because the instructions themselves are rather cryptic.
In addition to RiSC instructions, an assembly-language program may contain directives for the
assembler. These are often called pseudo-instructions. The only assembler directives we will use
are nop and .fill (note the leading period). The nop pseudo-instruction means “do not do anything
this cycle” and is replaced with the instruction add 0,0,0 (which clearly does nothing). The .fill
directive tells the assembler to put a number into the place where the instruction would normally
be stored. The .fill directive uses one field, which can be either a numeric value or a symbolic
address. For example, “.fill 32” puts the value 32 where the instruction would normally be stored.
Using .fill with a symbolic address will store the address of the label. In the example below, the
line “.fill start” will store the value 2, because the label “start” refers to address 2.
The assembler should make two passes over the assembly-language program. In the first pass, it
will calculate the address for every symbolic label. Assume that the first instruction is at address
0. In the second pass, it will generate a machine-level instruction (in hexadecimal) for each line of
assembly language. For example, the following is an assembly-language program that counts
down from 5, stopping when it hits 0.
lw 1,0,five # load reg1 with 5 (uses symbolic address) lw 2,1,3 # load reg2 with -1 (uses numeric address) start: add 1,1,2 # decrement reg beq 0,1,2 # goto end of program when reg1== beq 0,0,start # go back to the beginning of the loop nop done: halt # end of program five: .fill 5 neg1: .fill - startAddr: .fill start # will contain the address of start (2)
And here is the corresponding machine-level program (decimal values interpreted as 16-bit signed
integers):
(address 0): 8407 (signed decimal -31737, unsigned decimal 33799) (address 1): 8883 (signed decimal -30589, unsigned decimal 34947) (address 2): 0482 (signed decimal 1154, unsigned decimal 1154) (address 3): c082 (signed decimal -16254, unsigned decimal 49282) (address 4): c07d (signed decimal -16259, unsigned decimal 49277) (address 5): 0000 (signed decimal 0, unsigned decimal 0) (address 6): 6000 (signed decimal 24576, unsigned decimal 24576) (address 7): 0005 (signed decimal 5, unsigned decimal 5) (address 8): ffff (signed decimal -1, unsigned decimal 65535) (address 9): 0002 (signed decimal 2, unsigned decimal 2)
Be sure you understand how the above assembly-language program got translated to this
machine-code file.
Since your programs will always start at a known address (address 0), your program should only
output the contents, not the addresses.
c c07d 0000 6000 0005 ffff 0002
In general, acceptable RiSC assembly code is one-instruction-per-line. It is okay to have a line
that is blank, whether it is commented out (i.e., the line begins with a pound sign) or not (i.e., just
a blank line). However, a label cannot appear on a line by itself; it must be followed by a valid
instruction on the same line (a .fill directive or nop counts as an instruction). Therefore, your
assembler will never have to deal with this last situation, i.e. you will never have to deal with
labels that refer to empty lines ... trust me, this makes things significantly easier.
4.1 Running Your Assembler
You must write your program so that it is run as follows (assuming your program name is “assem-
ble”).
assemble assembly-code-file machine-code-file
Note that the format for running the command must use command-line arguments for the file
names (rather than standard input and standard output).
The first argument is the file name where the assembly-language program is stored, and the sec-
ond argument is the file name where the output (the machine-code) is written. Your program
should only store the list of hexadecimal numbers in the machine-code file, one instruction per
line—any other format will render your machine-code file ungradable. Each number can have
‘0x’ in front or not, as you wish. Any other output that you want the program to generate (e.g.
debugging output) can be printed to stdout or stderr.
4.2 Error Checking
Your assembler should catch errors in the assembly language program, as well as errors that occur
because the user ran your program incorrectly (e.g. with only 1 argument instead of 2 arguments).
For example, it should detect the use of undefined labels, duplicate labels, missing arguments to
opcodes (e.g. only giving two fields to lw ), immediate values that are out of range, unrecognized
opcodes, etc.
4.3 Assembler Hints
Printing out 16-bit numbers is done by attaching an ‘h’ to the print-format string (‘h’ stands for
half-word). For example, the following code prints out short integers correctly.
int num = 0x983475;/* larger than a 16-bit quantity */ short hword;
hword = num & 0xffff;
printf(“short int: 0x%04hx, %hd \n”, hword, hword);
The corresponding output:
short int: 0x3475, 13429
The first number printed is the value of hword as a hexadecimal number (the ‘04’ tells the printf
function to pad the number on the left with zeroes if necessary, up to a total length of 4 digits).
The second number printed is the value of hword as a decimal number. If you leave off the ‘h,’ or
instead use ‘l,’ the value printed might not reflect the actual value of the number (heavily depen-
dent on the compiler).
Because the immediate value is a 7-bit 2’s complement number, it only holds values ranging from
-64 to 63. For symbolic addresses, your assembler will compute the immediate value so that the
instruction refers to the correct label. Remember to check the resulting immediate value. Since
Sun workstations are 64-bit machines, you’ll have to chop off all but the lowest 7 bits for negative
numbers.
5. Behavioral Simulator (4%)
The second part of this assignment is to write a program that can simulate any legal RiSC
machine-code program. The input for this part will be the machine-code file that you created with
7. Grading
We will grade primarily on functionality, including error handling, correctly assembling and sim-
ulating all instructions, input and output format, method of executing your program, and correctly
multiplying. Be very careful to follow exactly the format for inputting the assembly language pro-
gram, outputting the machine-code file (from the assembler), inputting the machine-code file (into
the simulator), and outputting the state of the RiSC machine while running the program. We will
be running an automated checker on your files, so they must be in exactly the right format.
8. Turning in the Project
You should submit the following separate files:
1. C program listing for your assembler
2. C program listing for your simulator
3. Assembly file for the multiplication program
To submit your programs and output files, use the Makefile that is part of the project distribution.
At the UNIX prompt, type the following:
make submit
This will package up your programs and send them to me.
You should receive an e-mail acknowledgment for your submisstion, which includes a copy of the
submission, so that you can verify that what I received is what you sent. The official time of sub-
mission for your project will be the time the last file is sent. If you send in anything after the due
date, your project will be considered late (and will use up your late days or will receive a zero).
9. Code Fragment for Assembler
The focus of this class is machine organization, not C programming skills. To “build” your com-
puter, however, you will be doing a lot of C programming. To help you, here is a fragment of the
C program for the assembler. This shows how to specify command-line arguments to the program
(via argc and argv ), how to parse the assembly-language file, etc. This fragment is provided
strictly to help you, though it may take a bit for you to understand and use the file. You may also
choose to not use this fragment.
/* Assembler code fragment for RiSC */
#include <stdio.h> #include <string.h>
#define MAXLINELENGTH 1000
char * readAndParse(FILE *, char *, char **, char **, char **, char **, char **); int isNumber(char *);
main(int argc, char *argv[]) { char *inFileString, *outFileString; FILE *inFilePtr, *outFilePtr; char *label, *opcode, *arg0, *arg1, *arg2; char lineString[MAXLINELENGTH+1];
if (argc != 3) { printf(“error: usage: %s <assembly-code-file> <machine-code-file>\n”, argv[0]); exit(1); }
inFileString = argv[1]; outFileString = argv[2];
inFilePtr = fopen(inFileString, “r”); if (inFilePtr == NULL) { printf(“error in opening %s\n”, inFileString); exit(1); } outFilePtr = fopen(outFileString, “w”); if (outFilePtr == NULL) { printf(“error in opening %s\n”, outFileString); exit(1); }
/* here is an example for how to use readAndParse to read a line from inFilePtr / if (readAndParse(inFilePtr, lineString, &label, &opcode, &arg0, &arg1, &arg2) == NULL) { / reached end of file / } else { / label is either NULL or it points to a null-terminated string in lineString that has the label. If label is NULL, that means the label field didn’t exist. Same for opcode and argX. */ }
/* this is how to rewind the file ptr so that you start reading from the beginning of the file */ rewind(inFilePtr);
/* after doing a readAndParse, you may want to do the following to test each opcode / if (!strcmp(opcode, “add”)) { / do whatever you need to do for opcode “add” */ } }
char * readAndParse(FILE *inFilePtr, char *lineString, char **labelPtr, char **opcodePtr, char **arg0Ptr, char **arg1Ptr, char **arg2Ptr) {
/* read and parse a line note that lineString must point to allocated memory, so that *labelPtr, *opcodePtr, and *argXPtr won’t be pointing to readAndParse’s memory note also that *labelPtr, *opcodePtr, and *argXPtr point to memory locations in lineString. When lineString changes, so will *labelPtr, *opcodePtr, and *argXPtr. function returns NULL if at end-of-file */
char *statusString, *firsttoken; statusString = fgets(lineString, MAXLINELENGTH, inFilePtr); if (statusString != NULL) { firsttoken = strtok(lineString, " \t\n"); if (firsttoken == NULL || firsttoken[0] == ’#’) { return readAndParse(inFilePtr, lineString, labelPtr, opcodePtr, arg0Ptr, arg1Ptr, arg2Ptr); } else if (firsttoken[strlen(firsttoken) - 1] == ’:’) { *labelPtr = firsttoken; *opcodePtr = strtok(NULL, " \n"); firsttoken[strlen(firsttoken) - 1] = ’\0’; } else { *labelPtr = NULL; *opcodePtr = firsttoken; } *arg0Ptr = strtok(NULL, ", \t\n"); *arg1Ptr = strtok(NULL, ", \t\n"); *arg2Ptr = strtok(NULL, ", \t\n"); } return(statusString); }
int isNumber(char string) { / return 1 if string is a number */ int i; return( (sscanf(string, “%d”, &i)) == 1); }
11001 (binary)
| 01011 (binary)
= 11011 (binary),
which is 27 decimal.
6. ~ a is the bit-wise complement of a ( a is not changed); if a = 100101, ~ a = 011010.
Use these operations to create and manipulate machine-code. E.g. to look at bit 3 of the variable a ,
you might do: (a>>3) & 0x1. To look at bits 15-13 of a 16-bit word (for instance, the opcode of
each instruction), you could do: (a>>13) & 0x7. To put a 6 into bits 5-3 and a 3 into bits 2-1, you
could do: (6<<3) | (3<<1). If you’re not sure what an operation is doing, print some intermediate
results to help you debug.
12. Example Run of Simulator
The following is example output of an RiSC Level-0 simulator (160 lines of C code). This exam-
ple run simulates the original example RiSC machine code:
lw 1,0,five # load reg1 with 5 (uses symbolic address) lw 2,1,3 # load reg2 with -1 (uses numeric address) start: add 1,1,2 # decrement reg beq 0,1,2 # goto end of program when reg1== beq 0,0,start # go back to the beginning of the loop nop done: halt # end of program five: .fill 5 neg1: .fill - startAddr: .fill start # will contain the address of start (2)
Note that you do not have to follow this format for the output, but you do have to present this
information (after every cycle, print contents of memory, program counter, and register file). Print
out the numbers in hexadecimal, since much of it will be instruction words, and the numbers are
small enough that you should be able to translate to decimal fairly easily.
DEBUG: memory[0]= DEBUG: memory[1]= DEBUG: memory[2]= DEBUG: memory[3]=c DEBUG: memory[4]=c07d DEBUG: memory[5]= DEBUG: memory[6]= DEBUG: memory[7]= DEBUG: memory[8]=ffff DEBUG: memory[9]=
DEBUG: decoding lw: arg0=1, arg1=0, arg2=7, imm= State at the end of cycle 0 PC = 1 Regs: 00 0 01 5 02 0 03 0 04 0 05 0 06 0 07 0 Memory: 00 8407 01 8883 02 0482 03 c 04 c07d 05 0000 06 6000 07 0005 08 ffff 09 0002
DEBUG: decoding lw: arg0=2, arg1=1, arg2=3, imm= State at the end of cycle 1 PC = 2 Regs: 00 0 01 5 02 - 03 0 04 0 05 0 06 0 07 0 Memory: 00 8407 01 8883 02 0482 03 c 04 c07d 05 0000 06 6000 07 0005 08 ffff 09 0002
[ ... STUFF DELETED ... ]
DEBUG: decoding halt: arg0=0, arg1=0, arg2=0, imm= DEBUG: Machine Halted DEBUG: Total of 17 instructions executed State at the end of cycle 16 PC = 7 Regs: 00 0 01 0 02 - 03 0 04 0 05 0 06 0 07 0 Memory: 00 8407 01 8883 02 0482 03 c 04 c07d 05 0000 06 6000 07 0005 08 ffff 09 0002
The full output is given as part of the project distribution in the file t.sim. Note that you must fol-
low exqactly this format for your output: use the provided printState function given in the file
simulator.c ... note that anything printed to stderr will be ignored, as will anything preceded by
the word DEBUG. The only thing that the automated checker will look at is the following:
State at the end of cycle X PC = X Regs: 00 X 01 X 02 X 03 X 04 X 05 X 06 X 07 X Memory: 00 XXXX ...