

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Class: Data Structs & OO Development; Subject: Computer Science; University: Virginia Polytechnic Institute And State University; Term: Unknown 1989;
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Due Thursday, March 30 @ 11:00 PM for 100 points Early bonus date: Tuesday, March 28 @ 11:00 PM for 10 point bonus Late date: Monday, April 3 at 11:00 PM for 20 point penalty
You will implement an external sorting algorithm for binary data. The input data file will consist of many 4-byte records, with each record consisting of two 2-btye (short) integer values in the range 1 to 30,000. The first 2-byte field is the key value (used for sorting) and the second 2-byte field contains a data value. The input file is guaranteed to be a multiple of 4096 bytes. All I/O operations will be done on blocks of size 4096 bytes (i.e., 1024 logical records). Warning: The data file is a binary file. For information on processing binary files in C++, see URL http://courses.cs.vt.edu/~cs2606/binio.html. Your job is to sort the file (in ascending order), using a modified version of the Heapsort. The modification comes in the interaction between the Heapsort algorithm and the file storing the data. The heap array will be the file itself, rather than an array stored in memory. All accesses to the file will be mediated by a buffer pool. The buffer pool will store 4096-byte blocks (1024 records). The buffer pool will be organized using the Least Recently Used (LRU) replacement scheme. See Section 8.3 in the book for more information about buffer pools.
The primary design concern for this project will be the interaction between the logical heap as viewed by the Heapsort algorithm, and the physical representation of the heap as implemented by the disk file mediated by the buffer pool. You should pay careful attention to the interface that you design for the buffer pool, since you will be using this again in Project 4. In essence, the disk file will be the heap array, and all accesses to the heap from the Heapsort algorithm will be in the form of requests to the buffer pool for specific blocks of the file.
The program will be invoked from the command-line as:
heapsort <data-file-name> <num-buffers>
The data file <data-file-name> is the file to be sorted. The sorting takes place in that file, so this program does modify the input data file. Be careful to keep a copy of the original when you do your testing. The parameter <num-buffers> determines the number of buffers allocated for the buffer pool. This value will be in the range 1–20. At the end of your program, the data file (on disk) should be sorted. Do not forget to flush buffers from your bufferpool as necessary. In addition to sorting the data file, you must report some information to the standard output stream. This output for your program must appear EXACTLY as follows. ANY deviation from this requirement will result in a significant deduction in points. The information printed will consist of two parts. The first part will consist of the first record from each 4096 byte block, in order, from the final sorted output. The records are to be printed 8 records to a line (showing both the key value and
the data value for each record) with the values separated by whitespace, formated so that they line up in columns. The second part will be the time that your program took to execute. Put calls to “clock()” in your program, one at the beginning and another at the end. This function is available by using “#include <time.h>.” It returns a clock_t result that is compatible with long integers. The difference between the two values will be the total time in “clock ticks.” Divide this number by CLOCKS_PER_SEC to get total time in seconds.
You must conform to good programming/documentation standards, as described in the Ele- ments of Programming Style. Some specifics:
Neither the GTAs nor the instructors will help any student debug an implementation unless it is properly documented and exhibits good programming style. Be sure to begin your internal documentation right from the start. You may only use code you have written, either specifically for this project or for earlier pro- grams, or code taken from the textbook. Note that the textbook code is not designed for the specific purpose of this assignment, and is therefore likely to require modification. It may, however, provide a useful starting point. You may not use code from STL, MFC, or a similar library in your program.
A sample data file will be posted to the website to help you test your program. This is not the data file that will be used in grading your program. The test data provided to you will attempt to exercise the various syntactic elements of the command specifications. It makes no effort to be comprehensive in terms of testing the data structures required by the program. Thus, while the