Computer Science Assignment: Implementing a Word Frequency Counter | Exams Computer Science

Computer Science 3310

Program 5

Your assignment is to write, run, and test a program that does the following:

1. Read an input file whose name will be specified as a command line argument, breaking the lines of the file into

words.

2. Store the words in a hash table (collision resolution to be done by chaining) along with a count of how many

times the word appears in the text.

3. Produce the following as output to standard output.

• The n most frequently occurring words where the value for n is specified on the command line. Output one

line per word, giving the word and how many times it occurs.

• The total number of unique words found in the file.

• The length of the longest chain in the hash table.

PROGRAM DETAILS

Command Line Arguments: Java provides a way for a program user to specify program arguments at run time. The

code for doing this has been included in the file WordCount.java that is being provided for you as a starting template

for the assignment. What this allows you to do is specify program parameters when execution is requested rather

than going through a sequence of input prompts. Execution of the following command: java WordCount words.txt

17 will cause the program to execute, using words.txt as the input file and 17 as the value for n. The code provided

in WordCount.java checks the validity of these arguments automatically. If you are developing your program

without using command line execution, you will want to comment out all of the code involving the command line

arguments, and manually prompt for the name of the file and the value for n before beginning the processing.

However, when you submit your final program, it should handle the command line arguments as specified in the file

I provide.

Words: A “word” is either: (1) A sequence of letters, terminated by a non-letter; or (2) a sequence of letters which

contains the apostrophe, where a letter must both precede and follow the apostrophe. Examples of words are the

following:

Input Words

now is, the999 time8dkfj couldn’t now is the time dkfj couldn’t

999xjk,isnt’ he ‘would’ve done it** xjk isnt he would’ve done it

more sample’’words more sample words

You may assume that a word begins and ends on the same line of text. In order to count "The" and "the" as an

occurrence of the same word, you must store all words in lower case letters. Code to do that is provided in the

WordCount.java template. Note that the proper representation of apostrophe as a character constant is ‘\’’ (i.e.

apostrophe backslash apostrophe apostrophe). The function Character.isLetter(ch) that returns true if the char

variable ch is a letter will be useful.

Data Structure: Since chaining is being used for collision resolution, the hash table will be an array of lists. You

will use the Java API class LinkedList for the list type. Remember that with LinkedList item positions begin with 0.

The list will contain items of type WordItem. The definition of WordItem.java will be provided. Declarations for

the table as well as the value for TABLESIZE are provided in WordCount.java. You will need to write the code to

initialize each entry of the hash table.

Computer Science Assignment: Implementing a Word Frequency Counter, Exams of Computer Science