A Rule Based Approach for SPAM Detection

dc.contributor.authorKamboj, Ravinder
dc.contributor.supervisorSingh, V. P.
dc.contributor.supervisorBhatia, Sanmeet
dc.date.accessioned2010-08-24T09:00:42Z
dc.date.available2010-08-24T09:00:42Z
dc.date.issued2010-08-24T09:00:42Z
dc.descriptionM.E. (CSED)en
dc.description.abstractSpam is defined as a junk Email or unsolicited Email. Spam has increased tremendously in the last few years. Today more than 85% of e-mails that are received by e-mail users are spam. The cost of spam can be measured in lost human time, lost server time and loss of valuable mail. Spammers use various techniques like spam via botnet, localization of spam and image spam. According to the mail delivery process anti-spam measures for Email Spam can be divided in to two parts, based on Emails envelop and Email data. Black listing, grey listing and white listing techniques can be applied on the Email envelop to detect spam. Techniques based on the data part of Email like heuristic techniques and Statistical techniques can be used to combat spam. Bayesian filters as part of statistical technique divides the income message in to words called tokens and checks their probability of occurrence in spam e-mails and ham e-mails. Two types of approaches can be followed for the detection of spam e-mails one is learning approach other is rule based approach. Learning approach required a large dataset of spam e-mails and ham e-mails is required for the training of spam filter; this approach has good time characteristics filter can be retrained quickly for new Spam. But has very less space characteristics. Knowledge obtained from this method is tough to share with other users and mail servers. Second approach is rule based approach. It is used as direct approach by implementing rules for various kinds of Spams. For thesis work rule based approach has been followed. The intent is to implement rules for the various kinds of spam like: health, adult, educational and product offering Spam. Pattern analysis is performed on set of ham (legitimate) e-mails and spam e-mails. Corresponding probabilities for occurrences of various words/tokens has been calculated to design rules for selected tokens.en
dc.description.sponsorshipCSEDen
dc.format.extent1994754 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10266/1166
dc.language.isoenen
dc.subjectSpam, Anti-Spamen
dc.subjectBayesian Filteringen
dc.titleA Rule Based Approach for SPAM Detectionen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1166.pdf
Size:
1.9 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: