Word Ladders - Data Structure and Development | CS 2605, Study Guides, Projects, Research of Data Structures and Algorithms

The final project for the fall 2008 CS2605 class. Material Type: Project; Class: Data Structs & OO Development; Subject: Computer Science; University: Virginia Polytechnic Institute And State University; Term: Spring 2008;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 01/13/2009

widerman
widerman 🇺🇸

3

(1)

1 document

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Word Ladders
Grading
35% from correctness & testing (Web-CAT)
35% from style & coding
30% from report on algorithmic complexity & experiments
Goal
The word ladder problem is similar to a "6 degrees of separation" or Oracle of (Kevin)
Bacon game, but using words rather than people or actors. Simply, from a given starting
word, find the shortest "ladder" of single letter changes which leads to some final word,
where each intermediate state is also a word. For example, here is a ladder to turn stone
into money:
stone shone shine chine chins coins corns cores cones coney money
In this assignment, you will be designing and developing a set of classes that can find one
of the shortest word ladder between two words. The focus in this assignment is on
practicing the following skills:
Applying queues and trees and graphs to solve a complex problem
Traversing a graph using a breadth first search
Designing your own classes and functions with a great deal of flexibility
Designing & coding a large complex problem
Analyzing algorithms theoretically & experimentally
Using the strategy pattern to factor out implementations
Project 2, 3, and 4, if completed correctly, provide you the basic data structures to build
up the more complicated data structure needed for this project. However, if you do not
feel comfortable using your own code, or your code for either of the past projects did not
work, you may use STL with a penalty.
If you use the STL Deque in place of your LinkedDeque from project 2, you will
lose 10 points off your grade.
If you use the STL Map in place of your BST Map from project 3 you will lose 10
points off your grade.
No other STL data structures may be used.
Requirements
pf3
pf4
pf5

Partial preview of the text

Download Word Ladders - Data Structure and Development | CS 2605 and more Study Guides, Projects, Research Data Structures and Algorithms in PDF only on Docsity!

Word Ladders

Grading

 35% from correctness & testing (Web-CAT)  35% from style & coding  30% from report on algorithmic complexity & experiments

Goal

The word ladder problem is similar to a "6 degrees of separation" or Oracle of (Kevin) Bacon game, but using words rather than people or actors. Simply, from a given starting word, find the shortest "ladder" of single letter changes which leads to some final word, where each intermediate state is also a word. For example, here is a ladder to turn stone into money : stone shone shine chine chins coins corns cores cones coney money In this assignment, you will be designing and developing a set of classes that can find one of the shortest word ladder between two words. The focus in this assignment is on practicing the following skills:  Applying queues and trees and graphs to solve a complex problem  Traversing a graph using a breadth first search  Designing your own classes and functions with a great deal of flexibility  Designing & coding a large complex problem  Analyzing algorithms theoretically & experimentally  Using the strategy pattern to factor out implementations Project 2, 3, and 4, if completed correctly, provide you the basic data structures to build up the more complicated data structure needed for this project. However, if you do not feel comfortable using your own code, or your code for either of the past projects did not work, you may use STL with a penalty.  If you use the STL Deque in place of your LinkedDeque from project 2, you will lose 10 points off your grade.  If you use the STL Map in place of your BST Map from project 3 you will lose 10 points off your grade.  No other STL data structures may be used.

Requirements

The requirements for this project are extremely minimal to provide you the maximum amount of design flexibility. For testing you are only REQUIRED to implement one class called WordBase, with a total of two methods. However, that does not mean that all your code should be in that single class--far from it. You will likely require many classes to accomplish the this assignment. The class you must implement is found here, but is also duplicated below: class WordBase { public: WordBase(istream& input); Iterator* getWordLadder(const string& from, const string& to); }; See the actual header file for commenting and an example on usage. The constructor serves the purpose of initializing the WordBase with a set of words from an input stream. The getWordLadders function returns an iterator of iterators. The outer iterator loops over a set of iterators. The inner iterator loops over a set of strings, which is a specific word ladder. Thus getWordLadders allows iteration over each word in each word ladder for the giving starting and ending word, called "from" and "to." Using iterators in this manner allows you the maximum amount of design flexibility as it puts almost no requirements on the names or responsibilities of the classes you use for this project. Because of this it is highly recommended that you work out a fairly detailed design before you begin coding. The starting and ending words should be included in the word ladder.

A Solution Strategy

To solve this problem you should rely on the Graph data structure you implemented for the previous project. The vertices in your graph should be words (which can be the string class, or a more complicated class). The edge (v1,v2) should only exist in your graph if the hamming distance between v1 and v2 is exactly 1. The hamming distance is the number of letters that are different between v1 and v2 (assume that v1 and v2 must have the same number of letters). For example, the hamming distance between "couch" and "cough" is 1 because the only different letter is 'g' whereas the hamming distance between "crouch" and slouch" is 2. The reason we want to keep track of word pairs that have a hamming distance of exactly 1 is because these special pairs of words can be used as the building blocks of word ladders. If we have a graph that has an edge for each pair of words that have a hamming distance of 1, then every path in that graph is a word ladder. Thus, finding the word ladder between an arbitrary starting point and ending point is as simple as traversing the graph from the starting vertex to the ending vertex! Consequently, your program should perform two main tasks:

  1. FOR j=a..z
    1. possible_neighbor = from
    2. possible_neighbor[i] = j
    3. IF possible_neighbor is in the dictionary
    4. THEN add the edge (from, possible_neighbor) to the graph **2. END
  2. END**

Traversing the Graph

Now, once you've built your word graph, finding any word ladder is as simple as traversing the graph. You may recall from a graph theory course that Dijkstra's algorithm finds the shortest path from one vertex to all other vertices in a graph. If you wish to use this algorithm, you may. An equivalent way to compute the shortest path is by using a breadth first search. The general form of a breadth first search on a graph is as follows

  1. enqueue starting word
  2. WHILE queue is not empty
    1. LET V = the head of the queue
    2. remove the head of the queue
    3. mark V as visited
    4. FOR EACH neighbor, N, of V
      1. IF we haven't visited N
      2. THEN add N to the end of the queue **5. END
  3. END** Because we mark nodes as visited, we can guarantee that we visit each node once, and only once, and that there is no faster way of getting from the starting node to each node. In other words, a breadth first search inherently computes shortest paths. The tricky part is how to remember what the path is. This can be done by allowing each node to remember what it's parent node is. The parent node is the node that we came from in order to get to any specific node. In other words, in step 1.4.2, when we add N to the end of the queue, we know N's parent node is V. The word ladder from end to start is found by simply following the chain of parents back to the start, so we just need to reverse this in order to finally find the word ladder from start to end.

Input

The input is given as one word per line and it MAY be in alphabetical order (it is in the words.txt.zip file). You should make sure that you do not feed it into your BST in this

same order as it will cause your BST to degenerate into a linked list. This will severely degrade the performance of all algorithms that use the BST (think O(log n) => O(n log n)). Below are two approaches to permute the input so that the BST built is not linear. The first approach randomizes the order of insertion. The second approach builds an optimal tree assuming the input is in order. If the input is not in order, it will still work fine. We are assuming that the input is an array of strings called "input" with a length of "n" and "insert" inserts a string into the BST. You should just pick one of the following, dont implement both.

Randomization

  1. FOR i=0..n-
    1. swap(array[i], array[random(i, n-1)])
    2. tree.insert(array[i]) 2. END

Building an Optimal Tree

  1. preOrder(lower, upper)
    1. IF lower == upper
    2. THEN insert(array[lower]) 3. ELSE
      1. LET midpoint = floor((upper+lower)/2)
      2. insert(array[midpoint])
      3. preOrder(lower, midpoint)
      4. preOrder(midpoint+1, upper) **4. END
  2. END** The BST would be initialized by calling: preOrder(0, n).

Testing

For testing, we encourage you to start with small dictionaries and work up. Below are the dictionaries that result from extracting all the 2-, 3-, 4-, 5-, 6- and 7-letter words from the large dictionary. Start with 2- or 3-letter dictionaries because they only contain a small number of words. Below is a zip file of these small dictionaries. Small Dictionaries For large-scale testing, you can use this file<ahref="http:>. This data file is found in /usr/ share/dict/words on most Linux machines, and has around 250,000 words. You should delete this file before uploading to Web-CAT or you may exceed the maximum file upload size.

You will be graded on the design & thoroughness of your experiments, as well as the quality of your report.

Submitting

Submit your solution to Web-CAT under the assignment "Final Project." Be sure not to include any large files or Cxx tests that use large files. Web-CAT is unable to handle any such tests. Place your report in the same folder as your project so that it is submitted along with your code.