Spam Detection Algorithm Analysis - Final Project Proposal | CAP 6135, Study Guides, Projects, Research of Computer Science

Material Type: Project; Professor: Zou; Class: Malware and Software Vulnerability Analysis; Subject: Computer Applications; University: University of Central Florida; Term: Unknown 1989;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 11/08/2009

koofers-user-jbx-2
koofers-user-jbx-2 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Spam Detection Algorithm Analysis
CAP613 5 Final Pro je ct Prop os al Joe La Fa ta and Alex Wad e
Overview
Spam is currently a major problem with the internet. It wastes bandwidth, storage space, time, and is a
nuisance in general. Most e-mail providers now have a mechanism in place to filter out much of the
spam that comes into their servers. For our CAP6135 project proposal, we plan to study the spam
detection algorithms that can be used by e-mail providers in depth. We plan to write a modular
framework that allows multiple spam detection algorithms to be executed on sample e-mail messages
and reports performance metrics about these algorithms. We will test multiple algorithms and
combinations of algorithms and perform analysis on the reported metrics.
Purpose
The purpose of this project is to provide performance comparisons for a few well-known algorithms.
This information is beneficial in the fight against the growing amount of spam that is sent out. Internet
e-mail providers can use this information to help them determine the best techniques for fighting spam.
The scope of this project could potentially be expanded to include production of new or hybrid spam
detection algorithms, and/or the production a spam filtering service that can be run on an e-mail server
or client.
Goals
The primary goal of this project is to produce a flexible framework that supports multiple spam
detection modules. The framework should allow each algorithm to be trained on known spam and non-
spam e-mail messages and to be executed on a different set of input e-mails. It should collect
performance metrics on each algorithm including statistics on accuracy, required training time, and
processing time. We will find a public spam database or will come up with a message harvesting
technique in order to acquire training and test messages.
Another goal of this project is to learn how several spam detection algorithms work. We will implement
at least three of these algorithms as modules for our framework. Each of the algorithms will be
analyzed and all of them will be compared.
If time permits, there are several choices for additional goals. Because we are taking a modular
approach when building our system, it will be easy to add new spam detection algorithms. If possible,
we will implement additional spam detection algorithms or create ones that combine two or more
algorithms that we implemented. Another potential goal for this project is to create a service that can
be run on an e-mail server and perform spam filtering. A similar option to this will be to write a spam
filtering plug-in for an e-mail client such as Thunderbird, Evolution, or Outlook. These goals are optional
pf2

Partial preview of the text

Download Spam Detection Algorithm Analysis - Final Project Proposal | CAP 6135 and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Spam Detection Algorithm Analysis

CAP6135 Final Project Proposal – Joe LaFata and Alex Wade

Overview Spam is currently a major problem with the internet. It wastes bandwidth, storage space, time, and is a nuisance in general. Most e-mail providers now have a mechanism in place to filter out much of the spam that comes into their servers. For our CAP6135 project proposal, we plan to study the spam detection algorithms that can be used by e-mail providers in depth. We plan to write a modular framework that allows multiple spam detection algorithms to be executed on sample e-mail messages and reports performance metrics about these algorithms. We will test multiple algorithms and combinations of algorithms and perform analysis on the reported metrics. Purpose The purpose of this project is to provide performance comparisons for a few well-known algorithms. This information is beneficial in the fight against the growing amount of spam that is sent out. Internet e-mail providers can use this information to help them determine the best techniques for fighting spam. The scope of this project could potentially be expanded to include production of new or hybrid spam detection algorithms, and/or the production a spam filtering service that can be run on an e-mail server or client. Goals The primary goal of this project is to produce a flexible framework that supports multiple spam detection modules. The framework should allow each algorithm to be trained on known spam and non- spam e-mail messages and to be executed on a different set of input e-mails. It should collect performance metrics on each algorithm including statistics on accuracy, required training time, and processing time. We will find a public spam database or will come up with a message harvesting technique in order to acquire training and test messages. Another goal of this project is to learn how several spam detection algorithms work. We will implement at least three of these algorithms as modules for our framework. Each of the algorithms will be analyzed and all of them will be compared. If time permits, there are several choices for additional goals. Because we are taking a modular approach when building our system, it will be easy to add new spam detection algorithms. If possible, we will implement additional spam detection algorithms or create ones that combine two or more algorithms that we implemented. Another potential goal for this project is to create a service that can be run on an e-mail server and perform spam filtering. A similar option to this will be to write a spam filtering plug-in for an e-mail client such as Thunderbird, Evolution, or Outlook. These goals are optional

and will be attempted based on how long it takes to implement the other goals and how long these additional goals will take to implement.