







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Professor: Back; Class: Operating Systems; Subject: Computer Science; University: Virginia Polytechnic Institute And State University; Term: Fall 2006;
Typology: Exams
1 / 13
This page cannot be seen from the preview
Don't miss anything!








30 Students took the final exam. The table below shows the scores for each problem. I was quite happy about the relatively high median and believe it reflects a real learning result. While grading the exam, I felt that most of you had achieved a basic understanding of the concepts being tested.
Problem 1 2 3 4 5 Total Max 14 22 28 17 16 94 Min 5 7 9 3 4 48 Med 10 16 22.5 11 12 68. Avg 10.3 16.0 22.2 10.8 11.1 70. StdDev 2.3 3.0 5.1 3.6 4.2 11.
CS 3204 Fall 06 Final Exam Back Section N=30 Med=68.
0
2
4
6
8
10
12
41-50 51-60 61-70 71-80 81-90 91-
Students
Solutions are shown in this style. Grading Comments are shown in this style.
a) (6 pts) Raymond Chen relates the following anecdote in his blog “The Old New Thing” in this entry from Dec 15, 2004:
I was reminded of a meeting that took place between Intel and Microsoft over fifteen years ago. (…)
Since Microsoft is one of Intel's biggest customers, their representatives often visit Microsoft to show off what their latest processor can do, lobby the kernel development team to support a new processor feature, and solicit feedback on what sort of features would be most useful to add.
At this meeting, the Intel representatives asked, "So if you could ask for only one thing to be made faster, what would it be?"
Without hesitation, one of the lead kernel developers replied, "Speed up faulting on an invalid instruction."
The Intel half of the room burst out laughing. "Oh, you Microsoft engineers are so funny!" And so the meeting ended with a cute little joke.
After returning to their labs, the Intel engineers ran profiles against the Windows kernel and lo and behold, they discovered that Windows spent a lot of its time dispatching invalid instruction exceptions. How absurd!
i. (2 pts) Why did Intel’s engineers think Microsoft’s lead developer was joking?
Normally, faulting on an invalid instruction is an uncommon occurrence – as such, it is not something for which architecture designers should optimize.
You should mention that an invalid instruction faults are rare, as such faults usually signal a condition that should not happen, and such rare events are not worth optimizing. Wrong answers included making vague statements such as “they thought faults were always a bad thing.” Keep in mind that this meeting took place around 1989, which was more than 20 years after the invention of virtual memory (see the “RCA has it, IBM doesn’t” ad I posted from 1971.) Engineers understood the differences between the different types of faults by
ii. (4 pts) Give a reasonable explanation for why that version of Windows spent so much time dispatching invalid instruction exceptions!
It turned out that for that processor, it was the fastest way to force a mode switch, so Microsoft used an invalid instruction as a system call trap.
ii. (4 pts) Give one reason why modern OS such as Linux or Windows ended up using multithreaded file system implementations nonetheless, even on PCs used primarily by a single user.
It turns out that today’s PC did end up having multiple, simultaneously active jobs after all: desktop programs use multiple threads, and several programs can be active at once (say Google desktop indexing your files while your browser is playing a flash animation while mp3’s are played.)
Note that the question asked for a rationale for why OS designers ultimately decided to use multithreaded file systems even in desktop OS run on single-user PCs, despite the complexity pointed out by Tanenbaum. For both parts i and ii, I was pretty generous in what answers I accepted, as long as they addressed the question and didn’t include outright wrong information.
a) (4 pts) Why is exact least-recently used (LRU) not a practical page replacement strategy for modern virtual memory systems?
An exact implementation of LRU would require updating a queue or stack data structure on every memory access. Since every instruction involves a memory access (since the instruction itself must be fetched from memory), such updates would have to happen at pipeline speed. Especially on modern processors, this is not feasible.
This question had a typo: I meant to say “practicable”, as in “feasible” or “possible.” Many of you read the question as “is strict LRU a beneficial strategy in a recent system with a unified buffer & VM cache?” and gave a counter-example of where blindly applying LRU to a buffer cache does not give optimal replacement, such as large looping accesses or the case of a page that’s accessed only once but is kept in the cache in preference to older data that is going to be used. If you correctly described one such scenario and correctly explained why LRU fails, I gave full credit. Note that a pure sequential access is not a correct example – if access is sequential, it’s all misses and the replacement policy doesn’t matter. I should point out though that except for MIN, no replacement strategy is always optimal, but that doesn’t necessarily make it impractical in the sense of losing any benefit.
b) (10 pts) Most of you implemented a simple 1-bit clock algorithm in project
i. (4 pts) In making this choice, did the clock algorithm make a good decision? Justify your answer!
Both yes and no were accepted answers, if properly justified. No: stack pages in general are likely to be reused again in the near future and as such are generally not good candidates for eviction. Yes: because in this particular test case, this one was a stack page that belonged to the page-merge-seq parent process, which was suspended at the time the child process exhausted physical memory. This process won’t be run until the child that caused page eviction had finished, so evicting its pages, even its stack pages, may be a good decision that leaves more physical memory for the child. I also accepted if you implied that a page near the top of the stack would remain unused until the program’s execution unwinds to that stack page.
For full credit, you had to justify your answer and provide a reason.
ii. (6 pts) Explain how the clock algorithm arrived at the decision it did!
In a demand-paged system such as Pintos where pages are only brought in when accessed, every frame will have the accessed bit set when the clock algorithm is invoked for the first time. The algorithm iterates through all frames, resets the accessed bits of their associated page table entries, and chooses the first frame it encounters on the second revolution of the clock hand, which is the page frame that was allocated first. In other words, in the absence of access bits that provide an estimate of how recently a page was accessed, the clock algorithm defaults to FIFO! That is why it picked the page whose frame was installed first, which happened to be a stack page because the stack is the very first page a process installs when it starts (in setup_stack()).
For full credit, you had to make it clear that you understood that Clock defaults to FIFO unless the accessed bits provide additional information. Note that the problem specifically talked about the “first” page chosen by the algorithm. At that point, the clock hand will not have advanced, so the Clock doesn’t yet approximate LRU (at least not in the model you implemented in Pintos, where eviction only happens when a page is needed. If you’ve implemented a watermark-based trigger instead, where the clock starts running before you actually run out of page frames, you would have to state that in your answer.) I gave 3 pts partial credit for properly explaining how clock works in general.
c) (4 pts) Suppose a process is thrashing in a system that uses a local page replacement policy. Would a scheduler such as the BSD4.4 scheduler you implemented in project 1 assign a high or a low dynamic priority to that process? Justify your answer!
It would be given a high priority since it is highly I/O bound. It will not use very many CPU ticks since it spends most of its time blocked waiting for a page fault
These two questions were intended to test whether you understood the purpose of the multilevel indices you had implemented at an abstract level. For part iii., I was looking for at least two different arguments.
iv. (4 pts) Multilevel index-based file systems do not suffer from external fragmentation. Explain briefly what that means and why that is the case.
It means that a file system using those indices will never have to reject an allocation request while there are still free sectors on disk – any sector can be used to extend a file. This property holds because a multi-level index does not require any contiguous area on disk.
For full credit, you had to discuss what absence of external fragmentation is and you had to specifically mention why a multi-level indexed based structure avoids it.
b) (4 pts) Recent file systems such as XFS and ReiserFS use a technique called delayed allocation , in which the on-disk position (i.e., the sector number) of a newly written file block is not decided until that block is flushed from the buffer cache to disk. Why can this approach lead to better performance?
A key performance issue is to read sequentially from continuous sector numbers whenever you can. Most current OS use preallocation mechanisms to decide how much continuous space to allocate for a file. These mechanisms may over or underestimate how many continuous sectors are needed. If you delay the allocation decision until flush time, you (usually) have more accurate information about how many continuous sectors you’ll need.
Alternative answers that were accepted for full credit included arguing that delaying the allocation (and updating the freemap & index blocks) reduces the latency of an individual write as seen by the application, and arguing that delayed allocation would allow an allocation decision that is closer to where the disk head is located, eliminating seek time. The latter, however, is only a research idea at this point for single disks – it is used for filesystems that lie on RAID structures, however, in order to skirt the small-write problem. I deducted one point if you also claimed that the disk space that is allocated in this delayed fashion could be used by other applications in the interim – this is not true, the system must fail a write if there’s no space left. We can’t speculate that there will be sufficient space by the time the write is going to disk, we must ensure this, for instance, by keeping a free space counter.
I gave no credit if you suggested that the reason is to save bringing in the metadata (index blocks etc.) – you have to examine the metadata before you can decide whether a new block needs to be allocated or not. It doesn’t matter if you do it at eviction time or at write time as far as the number of disk accesses is concerned. (Unless you decided on a strategy, also possible, where each write that overwrites an entire block would lead to a reallocation of that block. You’d have to state it.)
c) (4 pts) Consistency. Give one example of an unacceptable inconsistency that FFS’s design (or any of the newer journal or write-ordering based file systems) prevents!
Any of the inconsistencies discussed in class were accepted here. For instance, FFS ensure that a user won’t see another user’s data after a crash, because it first persistently nullifies pointers to data before reusing the data. Another example includes that if a crash occurs while a file is renamed, the file won’t be lost – in the worst case, the old and the new name will both appear for the same file.
To get credit, your answer had to refer to a fault scenario were it makes sense to distinguish between inconsistencies from which one can recover and those that one must avoid by design.
d) (10 pts) Suppose you added ACLs to Pintos to allow or restrict read, write, or execute permissions for ordinary files (ignore directories). Assume that users would authenticate using a user id, and that each process carries the user id under which it executes in its PCB.
i. (4 pts) Which on-disk data structures would you change and how?
An ACL in this case is a list or array of (user id, rwx) pairs were rwx would be a 3- bit mask. A small number of such entries could be stored in the on-disk inode; for larger ACLs, you could store a sector number to additional ACL blocks, or even an inode number if you stored the ACLs in a file. Combinations are possible. It’s also possible to simply store a reference to an entry in a global ACL file.
ii. (3 pts) Which in-memory data structures in your file system code would change, if any? If none changed, say why not.
Acceptable answers are either none, or the in-memory inode structure. The trade-off is one of memory vs. access speed vs. consistency. If you replicate the ACL in the in-memory inode, you may avoid disk accesses to retrieve it, but you need more memory and have the added task of writing changes back to disk. If you don’t, you save the memory, and the consistency comes for free if you use your buffer cache’s dirty mechanism to update them. In addition, for frequently accessed ACLs, it’s likely that they would stay cached in the buffer cache.
exclusive access to the index block into which you install a pointer to the new block.
Circular Wait: this condition would indicate a bug in your code. In a straightforward implementation of multilevel indices, there would be a natural locking order, which follows the index blocks down the tree.
I gave 1 pts each simply for mentioning the conditions. Easy points if you studied. I gave an additional point for discussing whether this condition applies. Note that the problems asked specifically “Discuss each of the four conditions.” I even indicated that the score would be 4x2. For the circular wait, I was picky and gave the second point only if your discussion specifically referred to “your implementation of indexed and extensible files.” A common mistake was about the meaning of preemption. In this context, “preemption” would mean that access to a buffer cache block could be forcefully removed. It is unrelated to whether the CPU can be preempted. A second mistake was regarding the meaning of “hold-and-wait.” Here, it would mean to hold one resource (buffer cache block) and wait for another resource (e.g., block). It doesn’t mean to hold a lock and then sleep for an indefinite amount of time. The latter would be a starvation condition, not a deadlock one.
b) (4 pts) Suppose you added to your buffer cache API a “cache_purge(disk_sector_t s)” operation. cache_purge(s) would ensure that sector s is no longer in the cache. If sector s is not in the cache at the time of invocation, it does nothing. If it was in the cache, then the block containing it is simply discarded (even if it was dirty) and subsequently marked as unreserved. If your buffer cache had such an operation, when would you use it?
You’d use it when a block is deallocated, such as when a file is removed. Doing so will avoid the eviction of other data that was less recently used, but which is still alive and could be accessed again in the future.
Some of you suggested it could be used as a kind of nullification mechanism by which a process’s writes can be undone. I gave only partial credit for that, because even though that would indeed be the effect, such a mechanism would not be reliable – the block could have already been evicted to disk. I only accepted this for full credit if you made it clear that it would not be mechanism to be relied upon, but a mechanism that could limit damage that had already occurred. Also, several of you suggested that blocks written by a killed process don’t have to be written to disk. That’s generally not true.
c) (8 pts) One of the crucial properties one must ensure when implementing a buffer cache in a multi-threaded file system is that the I/O involved when evicting a single buffer cache block does not prevent other, independent
operations from proceeding in parallel. Some groups used a “two-level locking” approach for this purpose, which relies on a global lock to protect the buffer cache array or list, and a per-block lock that protects an individual block’s sector number and internal variables. The per-block lock is held when a block is evicted, but the global lock is not. The eviction part of cache_get_block then looks like this:
Acquire global lock Find block to be evicted Acquire per-block lock Remember previous sector number, set new sector number in block Release global lock Perform actual I/O that writes dirty data to previous sector Release per-block lock
Note that the block must be rededicated to the new sector before releasing the global lock such that another thread that is seeking to access the same new sector will have to wait for the lock on the same to- be-evicted block (which is necessary to ensure that there is at most one block reserved for each sector.)
This approach is correct and thread-safe. However, it does not guarantee that independent operations on the buffer cache can proceed in parallel. Construct a scenario that demonstrates this! (Note: two threads seeking to access the same sector are not independent operations, so this scenario would not count.)
It turns out this question also had a bug, although none of you noticed it. It’s actually not correct to reset the sector number of the to-be-evicted block before evicting it, because then a thread trying to access the to-be-evicted block will not find it and mistaken believe it is on disk! However, the rest of the problem still holds. The scenario in which a thread has to wait for the I/O of an unrelated operation to complete is when one thread is trying to access a block that is being evicted. The first thread evicts the block and is blocked on I/O, while the second blocks on the per-block lock while holding the global lock. Now, any third thread arriving at that time will find the global lock held, preventing access to the buffer cache altogether. Although the operations of the first and second thread are not independent, the operation of the third thread is.
Many of you answered that the problem is that the global lock is held while it examines the block list, which causes serialization even if two threads are attempting to access different blocks. This serialization, however, is unavoidable and required for correctness. It’s also not an issue on a uniprocessor since threads will not give up the CPU while examining the list. That is, the lock is not held while waiting for I/O, which is the key issue as the problem states in the first sentence. I gave 4 pts partial credit still.
In addition to increased fault tolerance, RAID can improve latency and bandwidth of large reads and writes (RAID-0, RAID-4, RAID-5), and also the latency of small reads (RAID-1.)
“inexpensive” isn’t really a benefit anymore – these days, the I in RAID means independent. Today, multiple single disks are usually cheaper than a RAID array, because RAID arrays are composed of single disks + RAID controller.
g) (4 pts) Suppose you are hired as an embedded systems engineer and your team lead tells you that their OS provides pure user-level threading without preemption. What do you need to be careful of in the application code you write for this system?
You need to avoid long (or infinite) loops to avoid starving other threads, and you have to provide a means to cooperate when being asked to terminate. Depending on the system, you may also have to avoid making any blocking calls or risk blocking the entire process.
Although “user-level threading” technically doesn’t imply that calls are blocking (many ULT libraries provide an I/O veneer to avoid such blocking), I accepted it for full credit. I gave partial credit for mentioning the danger of deadlock because of the lack of preemption. However, preemption here means preempting the CPU, not preempting a process’s access to a resource (see also problem 4a). An infinite loop is not typically deadlock, and accesses to such resources as locks and semaphores aren’t preemptible even in systems that support preempting the CPU.