









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of tlb-miss exceptions and their handling in the context of the mips architecture. It covers the distinction between exceptions and interrupts, the role of privileged instructions, and the process of handling tlb-miss exceptions in user and kernel modes. It also includes a table of privileged and pseudo-instructions related to exception handling.
Typology: Study Guides, Projects, Research
1 / 15
This page cannot be seen from the preview
Don't miss anything!










This project is intended to help you understand in detail how a modern microprocessor operates in con- cert with an operating system. You will build a precise-interrupt facility into your pipeline, you will add support for memory management via a translation lookaside buffer (TLB) , and, using RiSC-16 assembly code, you will write a software TLB-miss handler —the heart of a typical virtual memory system, which happens to be one of the most fundamental services that a modern operating system provides. Therefore, you will see the interaction between OS-level software and specialized control hardware (e.g. control reg- isters and TLBs, as opposed to simple instruction-execution hardware), and you will see how the OS uses and responds to interrupts —arguably the fundamental building block of today’s multitasking systems.
The new & improved RiSC-16 pipeline is shown in Fig. 1 on the next page. In the figure, shaded boxes represent clocked registers; thick lines represent 16-bit buses; thin lines represent smaller data paths; and dotted lines represent control paths. The pipeline is slightly different from the one illustrated and described in the previous project, reflecting the following changes:
As mentioned in class, interrupts must be handled in the writeback stage, otherwise it might be possible for interrupts to be handled out-of-order if back-to-back instructions cause exceptions, but do so in differ- ent stages of the pipeline. If an exceptional instruction is flagged as such at the moment the exception is detected, it is safe to handle that exceptional condition during writeback because all previous instructions by that time have finished execution and committed their state to the machine.
For this to work, the following things must happen in the pipeline:
ENEE 446: Digital Computer Design, Fall 2000 Assigned: Wednesday, October 4; Due: Wednesday, November 1
REGISTER FILE
SRC2 SRC
TGT
Program Counter
TLB & I-MEM
OP rA PC
Sign-Ext- SRC
TGT
SRC
OP rT PC OPERAND2 OPERAND
CTL (^3) EQ!
s1 s
CTL 5
OPrT STORE DATA
TLB & D-MEM
DATA IN ADDR
rT RF WRITE DATA
DATA OUT
CTL 2 WE (^) dmem
FETCH STAGE
DECODE STAGE
EXECUTE STAGE
MEMORY STAGE
WRITEBACK STAGE
rT^ RF WRITE DATA
Left-Shift-
MUXpc
MUXalu MUXalu
MUXout
FUNC (^) alu
Fig. 1: RiSC-16 5-stage pipeline
P (^) stomp^ SRC2^ SRC
ALU OUTPUT
OPERAND
CTL 4 MUX (^) imm
MUXop
CTL 7
IF
ID
EX
MEM
ID
EX
MEM
WB
WB END
PC
ADD
Pstall
PC
EXC
EXC
EXC
EXC
CTL 1
CTL 8
CTL 9
rB rC
CTL 6
s1 s
MUXs
x
x
x
x
that this mechanism supports are actually privileged instructions that the machine handles at the time of instruction execution, instead of vectoring to a software handler routine. This includes TLB handling routines, the HALT instruction, etc.
r0 (reads as zero) r r r r r r r
r0 (reads as zero) r r r r r r r
VISIBLE STATE IN USER MODE
VISIBLE STATE IN KERNEL MODE
Registers addressed with normal instructions
cr0 (reads as zero) cr1 - GPR cr2 - GPR2/TLB cr3 - GPR3/TLB cr4 - PSR cr5 - ISR cr6 - IMR cr7 - EPC Registers addressed with normal instructions
Registers accessed via special instructions
Register File:
Register Files:
0x7FFF USER SPACE
0x
0x
0xFFFF
M[0] - M[7FFF]
0x7FFF
0x
0x
0xFFFF
MAPPED via TLB
User Page Tables
OS, Handlers, I/O Devices
Memory Map:
Memory Map:
and Process Structs
ASID
Processor Status Register (PSR): 0 K 0 8 1 1 6
As mentioned, the instruction set has changed, now that software can insert exceptions directly into the pipeline and can execute a number of new privileged instructions. The more complete instruction set is given below:
This is not the full instruction set, but it is enough to do this project.
These and related modifications/extensions are described in more detail in the following sections.
Assembly-Code Format Meaning add regA, regB, regC R[regA] <- R[regB] + R[regC] addi regA, regB, immed R[regA] <- R[regB] + immed nand regA, regB, regC R[regA] <- ~(R[regB] & R[regC]) lui regA, immed R[regA] <- immed & 0xffc sw regA, regB, immed R[regA] -> Mem[ R[regB] + immed ] lw regA, regB, immed R[regA] <- Mem[ R[regB] + immed ]
beq regA, regB, immed
if ( R[regA] == R[regB] ) { PC <- PC + 1 + immed (if label, PC <- label) } jalr regA, regB PC <- R[regB], R[regA] <- PC + 1 PSEUDO-INSTRUCTIONS: nop do nothing trap type trap to the operating system with vectortype halt or sys TRAP_HALT ask operating system to stop machine & print state lli regA, immed R[regA] <- R[regA] + (immed & 0x3f) movi regA, immed R[regA] <- immed .fill immed initialized data with valueimmed .space immed zero-filled data array of sizeimmed PRIVILEGED INSTRUCTIONS: tlbw regB write TLB entry (held in regB) to the TLB sys class cause exceptional condition of specified class sys MODE_HALT stop machine & print state
rfe regB
return from exception: waits until writeback to jump through regB and return processor to user mode (does not save the link pointer)
These extended opcodes are used to directly effect change in the operation of the hardware; many of them insert exceptional conditions into the pipeline, but some are used to move data around the system. All are privileged operations (i.e. they require that the processor be in kernel mode) except for the TRAP instruc- tions, which simply invoke the operating system and thus enable kernel mode securely. Note that this brings up an interesting point: the HALT instruction is now defined as a slightly more complex process than before. HALT is now defined as one of the several modes of execution (including RUN and SLEEP), and putting the processor in a specific mode is a privileged action—user code cannot HALT the proces- sor. Thus, user code must ask the operating system to perform a HALT (which allows the graceful shut- down of the machine, were there a file system or something similar attached). Thus, HALT is now a two- stage process: first, user code calls a TRAP instruction with HALT as the argument. This causes an exceptional condition, and the machine vectors to the operating system’s corresponding TRAP handler, which in turn cleans up any system state necessary (not needed in this implementation) and then calls the MODE_HALT instruction.
The following table describes the various extended opcodes and their associated possible data values. Items that are shaded in the table represent conditions that cause hardware to vector to a software routine; all other items are essentially instructions that the hardware executes, just like ADD, ADDI, NAND, etc.
Those mechanisms that your Project 3 implementation must support are in bold. Note that in all cases but TLB_READ, TLB_WRITE, and SYS_RFE, the rA and rB fields of the JALR instruction are ignored.
Opcode Extension (EXT_OP) Data Extension (EXT_DATA) Semantics SYS_MODE (000) MODE_RUN (0) MODE_SLEEP (1) MODE_HALT (2) MODE_RFU .. MODE_RFU7 (3 .. 7) MODE_PANIC .. MODE_PANIC15 (8 .. 15)
Normal mode—ignore (equivalent to JALR) Low-power doze mode, awakened by interrupt Halt machine No definition yet
Halt and output panic value (8..15) (meaning is software-defined) SYS_EXCEPTION (001) EXC_GENERAL (0) EXC_TLBUMISS (1) EXC_TLBKMISS (2) EXC_INVALIDOPCODE (3) EXC_INVALIDADDR (4) EXC_PRIVILEGES (5)
General exception vector User address caused TLB miss Kernel address caused TLB miss Opcode the execute stage does not recognize Memory address is out of valid range Decoded privileged instruction in user mode SYS_INTERRUPT (010) INT_IO (0) INT_CLOCK (1) INT_TIMER (2)
General I/O interrupt Used to synchronize with external real-time clock Raised by a watchdog timer SYS_TRAP (011) TRAP_GENERAL (0) TRAP_HALT (1)
General operating system TRAP vector Ask operating system to perform HALT SYS_TLB (100) TLB_READ (0) TLB_WRITE (1) TLB_CLEAR (2)
Probe TLB for PTE matching VPN in rB Write contents of rB to TLB (random) Clear contents of TLB SYS_CRMOVE (101) Top bit specifies to/from Bottom 3 bits identify CR#
Moves a value to/from the control registers from/to the general-purpose registers SYS_RFE (110) Data value ignored Return From Exception: Performs a JUMP (without link) to the address held in rB (a con- trol register) and sets USER mode in the PSR SYS_RESERVED (111) Has no definition yet
These last three instructions, however, do use the rB and/or rA fields of the instruction, specifying a regis- ter to read and/or write.
Any of these exceptional conditions or extended opcodes can be invoked through the assembler, using the EXTEND opcode (looks like JALR with a non-zero immediate field). In most cases, there is no need to specify both EXT_OP and EXT_DATA because the EXT_DATA name uniquely identifies the excep- tional condition or extended operation. The assembler supports this facility via several mechanisms:
sys MODE_HALT # halts the machine sys INT_CLOCK # vectors to the CLOCK interrupt handler sys EXC_TLBUMISS # vectors to the TLBUMISS exception handler sys TRAP_HALT # vectors to the HALT trap handler (which executes a MODE_HALT)
ext rA, rB, TLB_READ # VPN to search for is in rB, match is written to rA ext rA, rB, TLB_WRITE # reads entry from rB, rA is ignored ext rA, rB, MODE_HALT # identical to “sys MODE_HALT” ... rA and rB are ignored ext rA, rB, SYS_RFE # this is identical to “rfe rB”
rfe rB # identical to: ext rA, rB, SYS_RFE trap type # identical to: sys type halt # identical to: trap HALT or sys TRAP_HALT tlbw rB # identical to: ext r0, rB, TLB_WRITE
3.3 Control Registers
The control registers are those extra 8 registers that are visible only in kernel mode:
cr0 - reads as 0, read-only cr1 - For general-purpose use cr2 - For general-purpose use and TLB interface cr3 - For general-purpose use and TLB interface cr4 - Processor Status Register cr5 - Interrupt Status Register cr6 - Interrupt Mask Register cr7 - EPC Register
As mentioned previously, these are the default registers when kernel mode is active, i.e. when the K-mode bit in the processor status register contains a ‘1’ value. Thus, when the operating system performs instruc- tions like the following:
add r1, r0, r
the operand values are read from control registers, and the result is written to a control register. This example moves the contents of the processor status register into cr1.
The control registers behave as follows:
cr0 Like rf[0], this is always zero.
cr1(gpr1) This register is for general-purpose use. However, if an interrupt handler is going to use the register, it should save the register’s contents before writing to it and restore the contents prior to exiting, just in case the handler happened to preempt another handler using the register.
The figure also illustrates the mechanism of address translation: virtual addresses are translated by the TLB into physical addresses. Translation consists of nothing more than replacing the virtual page number with the corresponding page frame number. The page offset is identical in both addresses (a given word is at the same location within a page, whether the page is virtual or physical).
The page table format is very similar to that used in the MIPS architecture: it is a two-tiered table, where the topmost level is in physical space, wired down when the application is executing, and the lower level is pageable and addressed virtually. We will call the top level the “root” for obvious reasons and the lower level the “user page table” because it maps the user address space. The page table organization is illus- trated below (note the difference in scale between the user page table and the user address space):
The full address space contains 256 pages, which requires 256 PTEs to map it. Each PTE is a single word, and 256 PTEs can thus fit in a single page. Therefore, a single page of PTEs can map the entire user space. The kernel keeps a set of pages in its virtual space, each of which holds one user page table. There are 64 of these tables (there are 64 unique ASIDs: the ASID is 6 bits wide), and the corresponding user tables are held in the top 64 virtual pages of the kernel’s address space. These are in turn mapped by root PTEs that are held in the top 64 words of page frame 0. Thus, the virtual page number of the user page table is equal to the physical address of the root PTE that maps it.
As mentioned, all user addresses are translated through the TLB, and kernel space is typically divided into regions that are translated through the TLB and other regions that map directly onto physical mem- ory. The kernel’s translated regions typically hold data that is seldom used, for example the various data structures (including process page tables) that are used to keep track of the running processes. If a process is not currently running, then none of these structures are in use, and they need not occupy physical mem- ory. Thus, it makes sense to put them into virtual space.
The different views of the 16-bit address space are shown below:
As mentioned, all user addresses from 0x0000 to 0xFFFF are mapped through the TLB. The top half of the kernel’s virtual space is also mapped through the TLB. The bottom half of the kernel’s address space, ranging from address 0x0000 to 0x7FFF is mapped directly onto the bottom half of main memory—the
0 12 3 ... USER ADDRESS SPACE ( 256 pages ) COMPLETE RiSC-16 16-bit VIRTUAL ADDRESS SPACE (64K words: 256 x 256-word pages)
... fe ff
0 12 3 ... USER PAGE TABLE ( 256 PTEs , 1 page ) ... fe ff
R
Root “table”: 1 PTE, 1 word V unused Page Frame Number
Each page table entry has the 16-bit format shown below:
Physical space Virtual space
one page
one word
USER SPACE 0x7FFF
0x
0x
0xFFFF
m[0] - m[7FFF]
0x7FFF
0x
0x
0xFFFF
MAPPED via TLB
User page tables
Maps directly onto physical memory
User Mode: Kernel Mode:
and process structs
Each page table is located by its corresponding ASID 0xC000 + (ASID << 8)
Process Structures, 64 pages
0xBFFF
0x
User Page Tables
0xFFFF
0xC
Kernel Virtual Space
(64 pages)
Dynamic Data, etc.
physical address equals the virtual address. Thus, references to this space cannot cause a TLB miss. This is an important consideration to remember when designing your TLB-miss handler.
When the TLB fails to find a given VPN, it raises an exception. If the address being translated is a user address (i.e., the CPU is in user mode), then the exception raised is EXC_TLBUMISS. If the CPU is in kernel mode and the top bit of the address to be translated is a ‘1’ then the exception raised is EXC_TLBKMISS. If the top bit of the address is ‘0’ the address cannot cause an exception because it is not translated through the TLB but instead is mapped directly onto physical memory.
Both exceptions behave as normal and perform additional functions before vectoring to the handler. When the TLBUMISS exception handler runs, its job is to find the user page table entry corresponding to the page that missed the TLB. The user page table is located in the top quarter of the address space, as shown above. The location of the PTE, given the ASID of the current user process and the VPN of the address that caused the TLB miss, is computed according to the following equation:
ADDRpte = 0xC000 + (ASID << 8) + VPN
To aid in the handling of the exception, the construction of this address is performed by hardware. This is similar to the memory-management facilities offered by MIPS processors and UltraSPARC processors. As soon as a TLBUMISS exception is detected, the hardware takes the VPN of the faulting address and the ASID currently stored in the process status register (PSR) and performs this computation. The address is placed in cr2 , control register 2. In addition, the hardware places the faulting address into cr3 and zeroes out the bottom eight bits (the page offset). After performing these steps, the hardware vectors to the UMISS handler. When the handler runs, it will use the virtual address in cr2 to reference the PTE. When the PTE is loaded, the handler obtains the PFN (see figures above & below for specifics on the PTE format). The handler then combines the VPN and PTE into the TLB’s format—this is accomplished by simply adding the contents of the two registers—and performs a tlbw (TLB write) instruction.
Note that the handler loads the PTE into the processor using a virtual address. Thus, it is possible for the handler itself to cause a TLB miss. This is what invokes the TLBKMISS handler.
When handling a kernel TLB miss (EXC_TLBKMISS), the page table needed is the kernel’s own page table that maps the top half of the address space (the kernel’s virtual space). This page table is located at address 128 in physical memory and extends to address 255. The top half of this page table (addresses 192–255) maps the user page tables referenced by the user TLB-miss handler. By construct, because the user page tables begin at virtual address 0xC000, their VPNs range from 0xC0 to 0xFF—in decimal, the range is 192–255. Therefore, by construction, the VPN of the virtual address for the user PTE that the umiss handler loads equals the physical address of the kernel PTE that maps the user page table.
Before vectoring to the KMISS handler, the hardware places this VPN (which, as described, is equal to the physical address required by the kmiss handler) into cr3 , control register 3. Unlike the UMISS han- dler, the hardware places the VPN in the LOW EIGHT BITS of the register. This is so that the handler can use the value first as an address and then as the VPN later. This avoids the need for the EXC_TLBKMISS hardware to write more than one value to the register file.
If an address in user mode causes a TLB miss, the hardware places the PTE address in cr2 and the VPN of the faulting instruction (labeled BadVPN ) in cr3. If this handler (or any kernel reference) causes a TLB miss using a virtual address with the top bit set, the hardware places the VPN of the faulting instruction into cr3. The format of the VPN in UMISS is chosen to facilitate quick generation of the TLB entry. The format of the VPN in KMISS is chosen to avoid having to place both the address and the VPN into the control registers. The handler can use the VPN first as an address, and then left-shift it eight places by successive add instructions to get the VPN into a TLB-entry format.
This is actually a fairly intricate process, and it demands careful attention on your part in the development of your TLB-miss handlers, otherwise data can get stepped on without the software realizing it (for
The steps that the KMISS handler goes through are very similar, except that the BadVPN as given to the handler is not in the correct format for the TLB entry. Thus, it must be left-shifted eight places before adding it to the page frame number (assume BadVPN is in r1 and the PTE is in r2):
lui r3, 0x8000 # will be used to test top bit nand r3, r3, r2 # r3=0111111111111111 => val; r3=1111111111111111 => inv nand r3, r3, r3 # r3=0x8000 => val; r3=0x0000=>inv beq r3, r0, invalid add r2, r2, r3 # clears top bit of PTE, PFN is left add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r1 # left-shift VPN by 1 add r1, r1, r2 # concatenates VPN with PFN tlbw r1 # writes contents of r1 to TLB, sets ‘v’bit in TLB entry
Note that these do not represent complete handlers: for example, error-checking is missing, the begin- nings of the handlers are missing (in which register contents are saved and the PTE loaded), the ends of the handlers are missing (in which register contents are restored from memory), and return-from-excep- tion is missing. For error-checking, you can simply HALT the machine prematurely, because the PTEs you reference in your page tables should never be invalid.
3.5 Physical Memory Map
Previous figures have illustrated the layout of virtual space. The following figure illustrates the layout of physical memory, including the all-important first page:
The kernel save area is used for saving state during handler execution, etc. The Ve , Vi , and Vt regions con- tain vector addresses for exceptions , interrupts , and traps , respectively. The kernel page table (KPT) maps the region of memory in which process structures and associated data are held. You will not use the KPT region in this project. The root page table (RPT) contains the mappings for the various user page tables that occupy the top quarter of the virtual address space. It is no accident that the RPT is placed in the top quarter of page frame 0—as mentioned earlier, the result is that the VPN of any kernel virtual address can be used directly as a physical address to obtain the appropriate root-level PTE.
For this project, you need to put data into page 0 (set up Ve and Vt regions, as well as one PTE in the root page table for an ASID of your choosing). Note that the PSR should be initialized appropriately to con- tain whatever ASID you have chosen, so that the hardware can create the correct address as part of responding to a TLBUMISS exception. You will also need to put handler code somewhere in physical memory, with the vector addresses initialized to point to the handlers using physical addresses. For exam- ple, you could very well put the handler code in page 1 and point the addresses in the Ve and Vt regions to these locations. Lastly, you must create a page table for an application. For example, you could put this page table into physical page 2 and point the root PTE corresponding to the ASID chosen to page 2.
3.6 Interrupt Vector Table
The interrupt vector table has a simple format: for every exceptional condition that the hardware recog- nizes (including exceptions, interrupts, and/or traps), there must be an address in the table that points to a
0 12 3 ... Pages hold anything: application code & data, kernel code & data, handler code, etc. ... fe ff
0 64 80 96 112 128 192 255
Kernel save area - Ve Vi Vt KPT RPT address:word
Page Frame 0:
handler routine. The table is located at physical address 80 in memory and has 48 entries (16 exception types, 16 interrupt types, and 16 trap types). This is illustrated in the following figure:
Those vectors that must be implemented are shaded; vectors that are not shaded do not need to be imple- mented in this project.
EXC_INVALIDADDREXC_PRIVILEGES EXC_TLBPRIV EXC_TLBKMISS EXC_TLBUMISSEXC_GENERAL
INT_TIMER INT_CLOCKINT_IO
(^8584) 83 82 (^8180)
95
98 (^9796)
111
(^113112)
127
INTERRUPT VECTORS
EXCEPTION VECTORS