Programming Assignment 3 – Spring 2006 - Data Structures | CS 2606, Assignments of Data Structures and Algorithms

Material Type: Assignment; Class: Data Structs & OO Development; Subject: Computer Science; University: Virginia Polytechnic Institute And State University; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 02/13/2009

koofers-user-47m
koofers-user-47m 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS2606 (Spring 2006)
PROGRAMMING ASSIGNMENT #3
Due Thursday, March 30 @ 11:00 PM for 100 points
Early bonus date: Tuesday, March 28 @ 11:00 PM for 10 point bonus
Late date: Monday, April 3 at 11:00 PM for 20 point penalty
Assignment:
You will implement an external sorting algorithm for binary data. The input data file will
consist of many 4-byte records, with each record consisting of two 2-btye (short) integer values in
the range 1 to 30,000. The first 2-byte field is the key value (used for sorting) and the second 2-byte
field contains a data value. The input file is guaranteed to be a multiple of 4096 bytes. All I/O
operations will be done on blocks of size 4096 bytes (i.e., 1024 logical records).
Warning: The data file is a binary file. For information on processing binary files in C++ ,
see URL http://courses.cs.vt.edu/~cs2606/binio.html.
Your job is to sort the file (in ascending order), using a modified version of the Heapsort. The
modification comes in the interaction between the Heapsort algorithm and the file storing the data.
The heap array will be the file itself, rather than an array stored in memory. All accesses to the
file will be mediated by a buffer pool. The buffer pool will store 4096-byte blocks (1024 records).
The buffer pool will be organized using the Least Recently Used (LRU) replacement scheme. See
Section 8.3 in the book for more information about buffer pools.
Design Considerations:
The primary design concern for this project will be the interaction between the logical heap as
viewed by the Heapsort algorithm, and the physical representation of the heap as implemented by
the disk file mediated by the buffer pool. You should pay careful attention to the interface that
you design for the buffer pool, since you will be using this again in Project 4. In essence, the disk
file will be the heap array, and all accesses to the heap from the Heapsort algorithm will be in the
form of requests to the buffer pool for specific blocks of the file.
Invocation and I/O Files:
The program will be invoked from the command-line as:
heapsort <data-file-name> <num-buffers>
The data file <data-file-name> is the file to be sorted. The sorting takes place in that file, so
this program does modify the input data file. Be careful to keep a copy of the original when you
do your testing. The parameter <num-buffers> determines the number of buffers allocated for the
buffer pool. This value will be in the range 1–20.
At the end of your program, the data file (on disk) should be sorted. Do not forget to flush
buffers from your bufferpool as necessary.
In addition to sorting the data file, you must report some information to the standard output
stream. This output for your program must appear EXACTLY as follows. ANY deviation from
this requirement will result in a significant deduction in points. The information printed will consist
of two parts.
The first part will consist of the first record from each 4096 byte block, in order, from the final
sorted output. The records are to be printed 8 records to a line (showing both the key value and
1
pf3

Partial preview of the text

Download Programming Assignment 3 – Spring 2006 - Data Structures | CS 2606 and more Assignments Data Structures and Algorithms in PDF only on Docsity!

CS2606 (Spring 2006)

PROGRAMMING ASSIGNMENT

Due Thursday, March 30 @ 11:00 PM for 100 points Early bonus date: Tuesday, March 28 @ 11:00 PM for 10 point bonus Late date: Monday, April 3 at 11:00 PM for 20 point penalty

Assignment:

You will implement an external sorting algorithm for binary data. The input data file will consist of many 4-byte records, with each record consisting of two 2-btye (short) integer values in the range 1 to 30,000. The first 2-byte field is the key value (used for sorting) and the second 2-byte field contains a data value. The input file is guaranteed to be a multiple of 4096 bytes. All I/O operations will be done on blocks of size 4096 bytes (i.e., 1024 logical records). Warning: The data file is a binary file. For information on processing binary files in C++, see URL http://courses.cs.vt.edu/~cs2606/binio.html. Your job is to sort the file (in ascending order), using a modified version of the Heapsort. The modification comes in the interaction between the Heapsort algorithm and the file storing the data. The heap array will be the file itself, rather than an array stored in memory. All accesses to the file will be mediated by a buffer pool. The buffer pool will store 4096-byte blocks (1024 records). The buffer pool will be organized using the Least Recently Used (LRU) replacement scheme. See Section 8.3 in the book for more information about buffer pools.

Design Considerations:

The primary design concern for this project will be the interaction between the logical heap as viewed by the Heapsort algorithm, and the physical representation of the heap as implemented by the disk file mediated by the buffer pool. You should pay careful attention to the interface that you design for the buffer pool, since you will be using this again in Project 4. In essence, the disk file will be the heap array, and all accesses to the heap from the Heapsort algorithm will be in the form of requests to the buffer pool for specific blocks of the file.

Invocation and I/O Files:

The program will be invoked from the command-line as:

heapsort <data-file-name> <num-buffers>

The data file <data-file-name> is the file to be sorted. The sorting takes place in that file, so this program does modify the input data file. Be careful to keep a copy of the original when you do your testing. The parameter <num-buffers> determines the number of buffers allocated for the buffer pool. This value will be in the range 1–20. At the end of your program, the data file (on disk) should be sorted. Do not forget to flush buffers from your bufferpool as necessary. In addition to sorting the data file, you must report some information to the standard output stream. This output for your program must appear EXACTLY as follows. ANY deviation from this requirement will result in a significant deduction in points. The information printed will consist of two parts. The first part will consist of the first record from each 4096 byte block, in order, from the final sorted output. The records are to be printed 8 records to a line (showing both the key value and

the data value for each record) with the values separated by whitespace, formated so that they line up in columns. The second part will be the time that your program took to execute. Put calls to “clock()” in your program, one at the beginning and another at the end. This function is available by using “#include <time.h>.” It returns a clock_t result that is compatible with long integers. The difference between the two values will be the total time in “clock ticks.” Divide this number by CLOCKS_PER_SEC to get total time in seconds.

Programming Standards:

You must conform to good programming/documentation standards, as described in the Ele- ments of Programming Style. Some specifics:

  • You must include a header comment, preceding main(), specifying the compiler and operating system used and the date completed.
  • Your header comment must describe what your program does; don’t just plagiarize language from this spec.
  • You must include a comment explaining the purpose of every variable or named constant you use in your program.
  • You must use meaningful identifier names that suggest the meaning or purpose of the constant, variable, function, etc.
  • Always use named constants or enumerated types instead of literal constants in the code.
  • Precede every major block of your code with a comment explaining its purpose. You don’t have to describe how it works unless you do something so sneaky it deserves special recogni- tion.
  • You must use indentation and blank lines to make control structures more readable.
  • Precede each function and/or class method with a header comment describing what the function does, the logical significance of each parameter (if any), and pre- and post-conditions.
  • Decompose your design logically, identifying which components should be objects and what operations should be encapsulated for each.

Neither the GTAs nor the instructors will help any student debug an implementation unless it is properly documented and exhibits good programming style. Be sure to begin your internal documentation right from the start. You may only use code you have written, either specifically for this project or for earlier pro- grams, or code taken from the textbook. Note that the textbook code is not designed for the specific purpose of this assignment, and is therefore likely to require modification. It may, however, provide a useful starting point. You may not use code from STL, MFC, or a similar library in your program.

Testing:

A sample data file will be posted to the website to help you test your program. This is not the data file that will be used in grading your program. The test data provided to you will attempt to exercise the various syntactic elements of the command specifications. It makes no effort to be comprehensive in terms of testing the data structures required by the program. Thus, while the