









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Main points of this past exam are: Clock Algorithm, Operating Systems, Systems Programming, Time Consuming, Aloha Algorithm, Broadcast Networking, Partially Transmitted, Remote Procedure, Procedure Calls, Little-Endian Integers
Typology: Exams
1 / 17
This page cannot be seen from the preview
Don't miss anything!










University of California, Berkeley College of Engineering Computer Science Division ⎯ EECS Fall 2007 John Kubiatowicz
December 3rd^ , 2007 CS162: Operating Systems and Systems Programming
Your Name:
SID Number: Circle the letters of CS Login
First: a b c d e f g h I j k l m n o p q r s t u v w x y z Second: a b c d e f g h I j k l m n o p q r s t u v w x y z
Discussion Section:
General Information: This is a closed book exam. You are allowed 1 page of hand-written notes (both sides). You have 3 hours to complete as much of the exam as possible. Make sure to read all of the questions first, as some of the questions are substantially more time consuming.
Write all of your answers directly on this paper. Make your answers as concise as possible. On programming questions, we will be looking for performance as well as correctness, so think through your answers carefully. If there is something about the questions that you believe is open to interpretation, please ask us about it!
Problem Possible Score
Total
Problem 1f[2pts]: The rate of page faults in a virtual memory system can always be reduced by adding more memory.
number of faults. Specifically: Belady’s anomaly can come into play for a FIFO replacement policy and certain access patterns.
Problem 1g[2pts]: Compulsory misses in a cache can be reduced with prefetching.
accessed the first time, there will not be a compulsory miss.
Problem 1h[2pts]: A “memoryless” probability distribution provides a poor model for any real sources of events.
can often be well approximated by a memoryless distribution—even if the individual sources are not memoryless.
Problem 1i[2pts]: Nonvolatile Ram (NVRAM) can improve the durability of a file system that uses delayed writes.
power failure occurs before the data is written out to disk, the data can be recovered from the NVRAM.
Problem 1j[2pts]: Because of the 32-bit IP-V4 address space, it is impossible for more than 2 32 computers to communicate over the internet.
computers.
Problem 2: Virtual Memory and Paging [25pts] Consider a two-level memory management scheme on 22-bit virtual addresses using the following format for virtual addresses:
Virtual Page # (7 bits)
Virtual Page # (7 bits)
Offset (8 bits)
Virtual addresses are translated into 16-bit physical addresses of the following form:
Physical Page # (8 bits)
Offset (8 bits)
Page table entries are 16 bits in the following format, stored in big-endian form in memory (i.e. the MSB is first byte in memory).
Physical Page # (8 bits)
Kernel OnlyUncacheable
0 0
DirtyUse WriteValid
Note that a virtual-physical translation can fail at any point if an incompatible PTE is encountered. Two types of errors can occur during translation: “invalid page” (page is not mapped at all) or “access violation” (page exists, but access was illegal).
Problem 2a[2pts]: Can you give a logical reason why the designer might have made the virtual page # fields 7 bits each?
With a 7 bit virtual page #, the size of each page table is equal to the size of one page of memory. 2 7 entries/table * 2 bytes/entry = 2^8 bytes/table; each page is 2^8 bytes. Grading: 2 points for recognizing that a page table fits exactly in one page; no other answer accepted.
Problem 2b[2pts]: What is the maximum amount of physical memory addressable by this system? Can you think of a way to increase the amount of available physical memory without altering the widths of the virtual addresses or PTEs?
16-bit byte-addressed physical addresses => 2 16 bytes of physical memory is addressable. You can use the two zero bits in the page table entry to address a larger number of physical pages (2 10 physical pages). You can also increase the size of pages (by increasing offset width), or add registers for segmentation. Grading: 1 pt. for each part.
Problem 3c[2pts]: How many total bits of storage will each entry of the TLB consume (including the tag and/or other fields)? Explain. Simplest answer: TLB valid bit, 14-bit VPN, 16-bit PTE = 29 bits Alternate: TLB valid, 14-bit VPN, 8-bit PPN (1/2 pt. each) , dirty, used, write (1/2 pt. for all 3) = 26 bits. We ignored kernel only and uncacheable bits.
Grading: -1 if you include 8-bit offset. -1 if you stated just “extra bits.” -1/2 if you included zero bits in the PTE. If you said 16 bits for PTE, we also required you to explicitly state TLB valid bit.
Virtual Page # (7 bits)
Virtual Page # (7 bits)
Offset (8 bits)
Physical Page # (8 bits)
Kernel Not
Cacheable 0 0
DirtyUse WriteValid
Problem 2g[4pts]: For the following problem, assume a hypothetical machine with 4 pages of physical memory and 7 pages of virtual memory. Given the access pattern: A B C D E F C A A F F G A B G D F F Indicate in the following table which pages are mapped to which physical pages for each of the following policies. Assume that a blank box matches the element to the left. We have given the FIFO policy as an example.
Access→ A B C D E F C A A F F G A B G D F F 1 A E B 2 B F D 3 C A F
For MIN, we accepted the final D in any of the first three pages. Grading: 2 pts. each for MIN and LRU; -1 for each error.
Problem 2i[3pts]: What is a precise exception and why would we want a software TLB fault to generate a precise exception?
A precise exception is one where the state of the machine is preserved as if the program executed up to the offending instruction. All previous instruction have completed, and the offending instruction and all following instructions act as if they had not even started.
We would like the hardware to generate a precise exception on a software TLB fault because it makes implementation of the fault handler easier. The fault handler simply restarts execution at the offending instruction (easy). It does not need to figure out the state of the system and potentially rollback or complete execution (in software) of partially completed instructions.
Grading: 1.5 pt. for each part.
Problem 3e[2pts]: The Fast File System (FFS) of Berkeley 4.2 Unix utilized “Skip Sector Positioning” to improve performance. Explain what “Skip Sector Positioning” is and why this optimization may no longer be important.
Skip sector positioning placed successive sectors of a file on every other sector (or with multiple sectors between). This helps to avoid the situation in which the processor must do so much work after reading each sector that it misses the next sector and has to wait for a complete revolution. Modern disk controllers have track buffers that keep data from a complete track in memory, thereby eliminating the need for sector interleaving
Problem 3f[3pts]: The Network File System (NFS) uses a “stateless” protocol between clients and servers. What does this mean? Name one advantage and one disadvantage of stateless filesystem protocols.
All the requests to the NFS server are self-contained: they include all the arguments required for execution and do not assume that information is maintained between successive server requests.
Advantage: Server can crash and restart transparently to the client. Disadvantage: Cannot track which clients are caching data, thus making it difficult to get clean caching semantics.
Grading: 1 point for explanation, 1 for advantage and 1 for disadvantage.
Problem 3j[6pts]: Suppose that a new disk technology provided access times that are of the same order of magnitude as memory access times. What, if anything, must be changed in the following three OS components to take advantage of the quicker access time? If something doesn’t change, be very specific why it doesn’t change. If it will change, contrast these changes with current implementation and be as specific as possible in your answers (i.e. identify what would change and why).
1. Process Scheduler Scheduler deals with processes at task level with arrival and run times so while run time might change the scheduler does not need change. 2. Memory Management Write through instead of write back since it’s cheaper to go to disk. Less need to prefetch and instead save on the cost of getting unused memory location. Do more on-demand paging instead of having to rely on predicting access pattern and missing. Mainly, allows memory management to balance hit cost and miss cost instead of focusing on increasing hit rates only. 3. Device Driver for the new disk Use polling instead of interrupt for new disk device driver since it will be in heavy use due to lower access. Or no change to the structure of the device driver from OS point of view.
Grading: 2 points for good reasoning for change or no change with some modification if there is a change. 1.5 points for good explanation of the choice to modify or not. 1 point for incomplete reasoning but in the right direction.
Problem 4a[3pts]: What is the maximum amount of data that the client or server can send in each packet while avoiding fragmentation along the way? Remember that the TCP+IP header is 40 bytes. Explain. Since the minimum MTU along the path is 296, and the header is 40 bytes, the maximum amount of data that the client or server can send in each packet is 296-40 = 256 bytes. Grading:1pt for minimum MTU, 1 pt for subtracting header, 1 for some explanation
Problem 4b[2pts]: Explain how the client or server could automatically discover the answer to 4a:
The sending end starts with large packets with the “no fragment” flag set in the header and slowly reduces the size until packets start making it through without being dropped. Grading: 1 pt for plausible endpoint-generated query, 1 pt for mentioning no frag bit
Problem 4c[2pts]: Under ideal circumstances (and ignoring interrupt and copying overheads and window sizes), what is the maximum data bandwidth that could be sent from the server to the client without dropping packets, assuming that the server utilizes the mechanism in 4b? Explain.
Assuming that we use the maximum packet size from 4a (i.e. 296) which contains 256 bytes of data in it, we can compute the fraction of the 3.7 Mbps that we have in the middle link as: 3.7 Mbps μ (256/296) = 3.2 Mbps Grading: 1pt identifying bottleneck bandwidth, 1 pt for scaling by correct fraction
Problem 4d[3pts]: Assume that a maximal sized packet (as in 4a) is sent from the TCP send buffer of the server to the TCP receive buffer of the client. Compute the total roundtrip latency from the point at which the server invokes the device driver to send the packet until it receive the ACK via an interrupt. Hint: don’t forget the interrupt at the receiving side and the DMA mechanism into and out of the network.
Note that the fact that the router is pipelined (see description above) implies that the router begins routing toward the output port even before it has received the complete packet. This simplifies things so that we only have to track latencies of packets through the router as 1ms. At the receiving side, we technically need to wait for the packet to be completely received before we start DMAing it into kernel memory. Also, we said that the ACK is zero length. If this is truly zero, then there is no header generation/transmission latency. Otherwise, we need to generate headers. There are two different options here: Ack latency (zero) or Ack latency (header) here. We translate to ms here. 10ns=10-5^ ms, 10 μ = 10 -2^ ms. 1Gps=10 12 bits/ms 100Mbps=10 11 bits/ms Message latency = [Gen (^) header +DMA (^) kernel → network]+[Link 2 +Route 2 +Linkmiddle +Route 1 +Link 1 ]+ [Receiveserver+DMA (^) network → kernel +Int (^) message] =(40 μ 10 -5^ ms+296 μ 10 -5^ ms] + [1ms+1ms+96ms+1ms+1ms]+ [(296 μ 8 bits)/(10 11 bits/ms)+296 μ 10 -5^ ms+10 -2^ ms] ≅ 100.02ms Ack latency (zero) = [Link 2 +Route 2 +Linkmiddle +Route 1 +Link 1 ]+Intack = 100.01 ms Ack latency(header)= [Gen (^) header +DMA (^) kernel → network]+Ack latency(zero)+ [Receiveclient+DMA (^) network → kernel] =[40 μ 10 -5^ ms+40 μ 10 -5^ ms]+100.01ms+ [(40 μ 8)/(10^12 bits/ms)+40 μ 10 -5^ ms] ≅ 100.01ms Total Answer ≅ 200.03ms Grading: 1 point on right track, -2 for missing all network components, -1 point for missing components of basic 200ms roundtrip, -½ for missing DMAs, -½ for missing ints, -½ for interrupt on send (not in critical path), -½ for copy into kernel (already in TCP buffer), -½ math error
Problem 4e[3pts]: What TCP window size is necessary to achieve the bandwidth of 4c without dropping packets in the network? Use simple constants and values that you computed for 4a-4d. Explain. Make sure that you correct for units.
What is needed here is the bandwidth-latency product, where the latency is the roundtrip latency. The result will be the total amount of data that needs to be “in the network” before reception of the first ACK. Here, the window size is in data bytes (not including header) so, we use the data bandwidth, not total bandwidth. Also, we must correct for the fact that (4d) was computed in ms. Notice how all of the units cancel out properly! (we want bytes)
Window size ≅ (4c) μ (4d) μ (10 -3^ s/ms) / (8bits/byte) ≅ (3.2 μ 106 bit/s) μ (200.03ms) μ (10 -3^ s/ms) μ (0.125 bytes/bit) ≅ 80012 bytes Grading: -1 use incorrect bandwidth (unless consistent with 4c), -1 missed converting bits to bytes (unless give answer with units of “b,” which means bits, -1 for not following directions (using only values from 4a through 4d or simple constants).
Problem 4f[6pts]: Suppose that video files are laid out in 64K (65536 bytes) chunks on the disk (i.e. 64K in successive sectors on a track). Compute the overhead for reading such a 64K chunk from a random place on the disk. Assume that the disk controller automatically DMAs the data to kernel memory in a fashion that is overlapped with reading it from the disk (so that you do not have to worry about DMA for this operation). After finishing, the controller generates an interrupt; the interrupt routine may submit another request to the controller (if one is queued on the DDRQ). Assume the disk parameters given above (repeated here):
What is the total time to read 64K chunk from a random place on the disk into memory including the interrupt? Hint: there are 5 terms here including the interrupt!
Timeserver = Timecontroller +Timeseek +Timerotate +Timetransfer +TimeInt = 1ms + 4ms + ½ [ 60 s/min μ 1000 ms/s / 10000 rev/min ] + 65536 bytes / [(50 × 10 6 bytes/sec) μ 0.001 sec/ms] + 0.01 ms = 8.01ms + 1.31ms = 9.32ms
Grading: 1 pt for each of 5 terms, 1 pt for calculation (can lose if have a unit conversion problem, like forgetting to convert from sec to ms or something similar, other reasons).
Problem 4j[5pts]: Returning to our single-disk server, assume that 15 clients connect through network resources that are independent up to router #2. Each client has its own user-level process working on its behalf. Assume that all 15 clients send requests at a rate of (4g). If the disk response distribution can be described with C=1.5 and the aggregate network request rate can be described as exponential, what is the average length of the disk device driver request queue (DDRQ)? Hint: the service time for the disk is the time between requests to the disk controller when the DDRQ is full. Possibly useful formulas include:
Mean Service: = (^) ∑ (^) =
n server (^) n i 1 Ti
T Variance: = (^) ∑ (^) = ( − )
n n i^^1 Ti^ Tserver
M/G/1 queue: (^) ⎟ ⎠
u
C u 2 1
Tq Tserver Little’s law: Lq =λ × Tq
Solution: The important thing about solving this problem is figuring out what equations to use and which numbers to plug in. Since we are looking at the DDRQ queue, the service time is the time for one chunk (64KB) to be grabbed from the disk and to generate an interrupt. This service time is exactly what we computed in (4f). The processor overhead from (4h) is overlapped with these hardware-generated numbers, so is not included. Note that we can get average service time directly from (4f), so don’t need to use the formula for “Mean Service” given above.
multiplying (4g) by 15.
Time (^) server = (4f) = 9.32ms/req (overhead from 4h is overlapped and not counted)
1pt for u; 1pt for Time (^) server ; 1 point for calculation.
Note that we gave you credit for using numbers that you computed previously, even if they were incorrect. However, we gave -1pt if you computed a value for u that was > 1 but didn’t say anything about it (i.e. didn’t point out that you must have made a mistake).
Also, you could use -1pt for failing to correct for units (i.e. ms/s etc).