aalmangour / Spamfilter-1

Spam filter : Classification of e-mail as spam or ham based on Naive Bayes and Logistic Regression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spamfilter

Spam filter : Classification of e-mail as spam or ham based on Naive Bayes and Logistic Regression

How to run this?

Steps to Compile and run the program:

Naive Bayes: Download and unzip the folder containg the files of Naive Bayes program : NBMain.java and NaiveBayes.Java

Input Folder Structure : Example

F:\temp
	   	\test
                      \spam
	               \ham
                 \train
                       \spam
	                \ham
                 stopwords.txt

**Main Driver program - NaiveBayes.java

Argument 0 - folder containing test + train folders + SpamWords.txt Argument 1 - yes or no to indicate whether to consider ( Yes - Remove Stop word ; No - Do not remove Stop words)

To run the program with considering Stop word criteria

Step 1: javac NBMain.java

** To run the program - considering Stop word criteria** (remove stop word and run)

Step 2:

java NBMain path_to_folder conataining_trainingfolder_and_testfolder_and_stopword.txt_fileno yes/no

In the above case it would be;

java NBMain F:\temp no
java NBMain F:\temp yes

-------------------------------------------------****

Similarly with Logistic Regression for the same folder structure above:

Argument 0 : path to folder containing above folders- test and train + stopword.txt file Argument 1: to_filter_stopwords: yes or no Argument 2: learning_rate_eta Argument 3: lambda Argument 4: num_iterations

Step1 : javac LRMain.java Step 2 : java LRMain yes_or_no_to_filter_stop_words 0.01 0.1 5 Example: java LRMain no 0.01 .01 10

------------------------------------------------******

Please note in case og Logistic regression, the program take lot of time to complete execution. I did test run on Amazon ec2 to get its execution speed up.

About

Spam filter : Classification of e-mail as spam or ham based on Naive Bayes and Logistic Regression


Languages

Language:Java 100.0%