Project Report for Programming Languages | CS 4700, Study Guides, Projects, Research of Programming Languages

Material Type: Project; Class: Programming Languages; Subject: Computer Science; University: Utah State University; Term: Spring 2006;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 07/30/2009

koofers-user-zyc
koofers-user-zyc 🇺🇸

9 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 4700 Spring 2006 Program 5 (20 points)
Completion of this program (as well as all others) is required for a passing grade in cs4700. This is not
a group project and must be written individually.
One of the most common uses for Perl is to create scripts for automating frequently performed tasks.
Often, these tasks would not be difficult for an individual to perform, but due to the number of times
they must be repeated, they can become tedious. In this assignment, you will create a Perl script to
automate the task of cleaning up the images directory for a set of html pages.
Suppose you find yourself responsible for maintaining a set of web pages created by another
individual. The individual planned ahead and placed all the html files in one directory and all the
images referenced within the html pages in a separate directory. Upon inspection, you also discover
that several of the files in the images directory are never actually being used. To aid in organization,
you would like to know which images are used and which are not.
You should create a Perl script which is capable of gathering this information for you. When your
script is run, it should first find the names of all html files stored in that directory. Then, it should open
each file and determine which image files are being referenced within that file. You will need to keep
a list of all files that are currently being referenced. Once all files have been processed, you will need
to compare the list of currently referenced files with a list of all files within the images subdirectory.
For each image file, (1) keep track of how many times it is referenced (2) note any file which is
referenced but does not exists (3) note any file which exists but is not referenced.
To locate the referenced files, you will need to search within the html files for image tags. This is how
an image tag might look: <img src=”images/image3.gif” width=”210” />. Using regular expressions,
you should be able to locate an isolate the name of the actual referenced file (in this case,
images/image3.gif).
Be aware that tags within html are not case sensitive. The image tags may appear as “<img … />”,
“<IMG … />” or any other combination of upper and lower case. Similarly, the attributes within the
tag (height, width, etc) need not be in any particular order, and may not be present at all. Likewise,
white space within the tag should also be ignored. <img src=”images/image1.gif” /> is the same as
<img src = “images/image1.gif” / >.
To help you in debugging your script, a zip file containing a few html files and an images directory has
been provided with this assignment. Download and unzip example.zip. Take a moment to look at the
html files, then try running your script within the directory where you extracted your files. Verify that
the information you generated it correct.
Download and unzip example2.zip. See how your PERL program deals with this set of files. This has
more difficult images to recognize such as
1. Two images per line
2. Images which are referenced but NOT part of an image tag.
3. Image tags which extend over two lines.
pf2

Partial preview of the text

Download Project Report for Programming Languages | CS 4700 and more Study Guides, Projects, Research Programming Languages in PDF only on Docsity!

CS 4700 Spring 2006 Program 5 (20 points)

Completion of this program (as well as all others) is required for a passing grade in cs4700. This is not a group project and must be written individually. One of the most common uses for Perl is to create scripts for automating frequently performed tasks. Often, these tasks would not be difficult for an individual to perform, but due to the number of times they must be repeated, they can become tedious. In this assignment, you will create a Perl script to automate the task of cleaning up the images directory for a set of html pages. Suppose you find yourself responsible for maintaining a set of web pages created by another individual. The individual planned ahead and placed all the html files in one directory and all the images referenced within the html pages in a separate directory. Upon inspection, you also discover that several of the files in the images directory are never actually being used. To aid in organization, you would like to know which images are used and which are not. You should create a Perl script which is capable of gathering this information for you. When your script is run, it should first find the names of all html files stored in that directory. Then, it should open each file and determine which image files are being referenced within that file. You will need to keep a list of all files that are currently being referenced. Once all files have been processed, you will need to compare the list of currently referenced files with a list of all files within the images subdirectory. For each image file, (1) keep track of how many times it is referenced (2) note any file which is referenced but does not exists (3) note any file which exists but is not referenced. To locate the referenced files, you will need to search within the html files for image tags. This is how an image tag might look: . Using regular expressions, you should be able to locate an isolate the name of the actual referenced file (in this case, images/image3.gif). Be aware that tags within html are not case sensitive. The image tags may appear as “”, “” or any other combination of upper and lower case. Similarly, the attributes within the tag (height, width, etc) need not be in any particular order, and may not be present at all. Likewise, white space within the tag should also be ignored. is the same as . To help you in debugging your script, a zip file containing a few html files and an images directory has been provided with this assignment. Download and unzip example.zip. Take a moment to look at the html files, then try running your script within the directory where you extracted your files. Verify that the information you generated it correct. Download and unzip example2.zip. See how your PERL program deals with this set of files. This has more difficult images to recognize such as

  1. Two images per line
  2. Images which are referenced but NOT part of an image tag.
  3. Image tags which extend over two lines.

HINT

We want to be able to match (and remember) multiple things per line. The code below counts (and prints) how many occurrences of “salary.” occur in each line of a file. The “magic” happens because of the “g” in the pattern match. open(INPUT,"myIn.txt"); while ($line = ) { $count = 0; while ($line =~ /(salary.)/gi) { print "$1\n"; $count++; } print "Count: $count\n"; }