CSCI-1200 Computer Science II — Fall 2008

Homework 10 — Performance and Order Notation

In this final assignment for CSII, you will carry out a series of tests on the fundamental data structures in the

Standard Template Library to evaluate the relative performance of these data structures and solidify your

understanding of order notation. Be sure to read the entire handout before beginning your implementation.

The Program

You will measure the runtime and the number of pairwise comparisons needed for a few common, mod-

erately compute-intensive operations: sorting, removing duplicates (without changing the overall order),

and finding the mode. You will perform these tests on STL string objects that can be read from a file

or constructed from random sequences of chars. The data structure, operation, size of the test (number

of strings), source of the input, and output file are specified on the command line. Here are two sample

input lines:

a.exe vector sort 10000 random 5 out.txt

a.exe vector mode 1000 in.txt out2.txt

The first example will generate 10,000 random strings of length 5, use a vector to sort them, and then

output the result to the file named “out.txt”. The second example will read the first 1,000 strings from the

file named “in.txt”, use a vector to find the most frequently occurring value (implemented by first sorting

the data), and then outputs that string (the mode) to a file named “out2.txt”. We provide a starting base

of code that performs these operations using the STL vector data structure.

You will extend this program to allow other data structures in place of vector, including: STL list, STL

set or map, STL priority queue, and the cs2hashset implementation of hash tables. For extra credit

you may test other data structures. You should carefully consider the most efficient way (minimize the

running time) to use the data structure to complete the operation. Similarly, for extra credit you may

test other common operations. In the provided code base, two fixed length arrays of string objects are

used to load and output the data. The data structure specified on the command line is the only additional

data structure that is “allowed” in the implementation of the operation. Thus, some combinations of

data structure and operation may not be feasible. In these cases include a short writeup in your report

explaining why the pairing of data structure and operation is impractical.

Measuring the Performance and # of Comparisons

The provided code demonstrates how the clock() function can be used to measure the processing time

of the computation. The resolution accuracy of the timing mechanism is system and hardware dependent

and may be in seconds, milliseconds, or something else. If the resolution on your system is coarse, you

must base your analysis on the measurements from sufficiently large datasets. The program reports the

time to load, process, and save the data. The provided code also demonstrates how the number of pairwise

comparisons used by the operation can be counted by implementing wrapper functions for the <,>, and

== string comparison functions. Functors are also provided for use with set,map,priority queue, and

cs2hashset. Both measurements should inform your analysis of the data structures.

The Report

You will submit the source code for the program described above, but the bulk of the points for this

assignment will be awarded for a complete and well-written report that presents and analyzes

the results of your testing. The basic outline of the report should include: an introduction with your

initial hypotheses; the procedure which includes any nontrivial implementation details; the data from your

tests with some intermediate analysis; and a conclusion that summarizes your findings about the data

structures, in what instances the different data structures are most useful, and any surprises you found in

the results. Any unanswered questions in your analysis that require further testing, should be described

as future work.

Performance and Order Notation - Homework 10 - Data Structures | CSCI 1200, Assignments of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Performance and Order Notation - Homework 10 - Data Structures | CSCI 1200 and more Assignments Data Structures and Algorithms in PDF only on Docsity!

CSCI-1200 Computer Science II — Fall 2008

Homework 10 — Performance and Order Notation

The Program

Measuring the Performance and # of Comparisons

The Report

The Data

of strings load time (sec) operation time (sec) output time (sec) # of comparisons

The Analysis

Submission