

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Project; Professor: Zou; Class: Malware and Software Vulnerability Analysis; Subject: Computer Applications; University: University of Central Florida; Term: Unknown 1989;
Typology: Study Guides, Projects, Research
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Overview Spam is currently a major problem with the internet. It wastes bandwidth, storage space, time, and is a nuisance in general. Most e-mail providers now have a mechanism in place to filter out much of the spam that comes into their servers. For our CAP6135 project proposal, we plan to study the spam detection algorithms that can be used by e-mail providers in depth. We plan to write a modular framework that allows multiple spam detection algorithms to be executed on sample e-mail messages and reports performance metrics about these algorithms. We will test multiple algorithms and combinations of algorithms and perform analysis on the reported metrics. Purpose The purpose of this project is to provide performance comparisons for a few well-known algorithms. This information is beneficial in the fight against the growing amount of spam that is sent out. Internet e-mail providers can use this information to help them determine the best techniques for fighting spam. The scope of this project could potentially be expanded to include production of new or hybrid spam detection algorithms, and/or the production a spam filtering service that can be run on an e-mail server or client. Goals The primary goal of this project is to produce a flexible framework that supports multiple spam detection modules. The framework should allow each algorithm to be trained on known spam and non- spam e-mail messages and to be executed on a different set of input e-mails. It should collect performance metrics on each algorithm including statistics on accuracy, required training time, and processing time. We will find a public spam database or will come up with a message harvesting technique in order to acquire training and test messages. Another goal of this project is to learn how several spam detection algorithms work. We will implement at least three of these algorithms as modules for our framework. Each of the algorithms will be analyzed and all of them will be compared. If time permits, there are several choices for additional goals. Because we are taking a modular approach when building our system, it will be easy to add new spam detection algorithms. If possible, we will implement additional spam detection algorithms or create ones that combine two or more algorithms that we implemented. Another potential goal for this project is to create a service that can be run on an e-mail server and perform spam filtering. A similar option to this will be to write a spam filtering plug-in for an e-mail client such as Thunderbird, Evolution, or Outlook. These goals are optional
and will be attempted based on how long it takes to implement the other goals and how long these additional goals will take to implement.