

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An assignment for a computer science course where students are required to write a java program that reads an input file, stores words in a hash table using chaining for collision resolution, and produces the 10 most frequently occurring words, the total number of unique words, and the length of the longest chain. The program should handle command line arguments and use linkedlist for the hash table.
Typology: Exams
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Your assignment is to write, run, and test a program that does the following:
Command Line Arguments: Java provides a way for a program user to specify program arguments at run time. The code for doing this has been included in the file WordCount.java that is being provided for you as a starting template for the assignment. What this allows you to do is specify program parameters when execution is requested rather than going through a sequence of input prompts. Execution of the following command: java WordCount words.txt 17 will cause the program to execute, using words.txt as the input file and 17 as the value for n. The code provided in WordCount.java checks the validity of these arguments automatically. If you are developing your program without using command line execution, you will want to comment out all of the code involving the command line arguments, and manually prompt for the name of the file and the value for n before beginning the processing. However, when you submit your final program, it should handle the command line arguments as specified in the file I provide.
Words: A “word” is either: (1) A sequence of letters, terminated by a non-letter; or (2) a sequence of letters which contains the apostrophe, where a letter must both precede and follow the apostrophe. Examples of words are the following:
Input Words now is, the999 time8dkfj couldn’t now is the time dkfj couldn’t 999xjk,isnt’ he ‘would’ve done it** xjk isnt he would’ve done it more sample’’words more sample words
You may assume that a word begins and ends on the same line of text. In order to count "The" and "the" as an occurrence of the same word, you must store all words in lower case letters. Code to do that is provided in the WordCount.java template. Note that the proper representation of apostrophe as a character constant is ‘\’’ (i.e. apostrophe backslash apostrophe apostrophe). The function Character.isLetter(ch) that returns true if the char variable ch is a letter will be useful.
Data Structure: Since chaining is being used for collision resolution, the hash table will be an array of lists. You will use the Java API class LinkedList for the list type. Remember that with LinkedList item positions begin with 0. The list will contain items of type WordItem. The definition of WordItem.java will be provided. Declarations for the table as well as the value for TABLESIZE are provided in WordCount.java. You will need to write the code to initialize each entry of the hash table.
Predefined functions: To assist you in developing the program, I provide 2 functions: (1) hash - the hash function which takes a word and produces the hash value ranging between 0 and TABLESIZE-1; (2) wordCopy - takes an input word and makes a copy of it, allocating necessary memory. You can look at WordCount.java for the details.
Required functions: (you may need/want others)
Technical Implementation Requirements: One of the best ways to determine if students have mastered some principles of data abstraction is to have you apply such principles in your software solutions. To that end for this assignment, you must implement your solution for this program using the following features in order to receive full credit for correctness (assuming that your program produces correct answers as well)
Test Data: You need to verify your program on small test files of your own data. Run your program on the same file for various values of n. In particular, try choosing values of n where the cutoff point involves words that occur the same number of times. I will test your program on large files. In particular, I have an on-line copy of Mark Twain’s book, Huckleberry Finn. It is located at http://www.cs.ecu.edu/~rws/c3310/Book. Each file consists of a set of chapters. For example, the file part1.txt contains the first nine chapters. Assuming you’ve downloaded the file, your output should appear something like the following if you type in the following command line: java WordCount part1.txt 10
The 10 most frequently occurring words were:
There were a total of 2331 unique words The longest chain was 22