Implementation of ANN for email spam-detection using TensorFlow
The idea is simple - given an email you’ve never seen before, determine whether or not that email is Spam or not.
It is simple ,but very efficient as I reached 99.6% accuracy .
The code is tested on python 2.7.11 and should work on python 2.x
The data provided is from a kaggle competition
TR.tar.gz
FILES contains 2500 mails both in Ham(1721) labelled as 1 and Spam(779) labelled as 0spam-mail.tr.label
is the associated training labels.ExtractContent.py
extract the subject and body of the email.
In a python compatible environment,
1, invoke the script by command
./ExtractContent.py
2, input source directory -- where you store the source files
For exmaple C:\EMAILPro\CSDMC2010_SPAM\TEST
3, input destination directory -- where you want the extracted body to be
For example C:\EMAILPro\CSDMC2010_SPAM\TEST_NEW
4, we are done.
email_input.py
vectorize the emails text,and outputs trainX.csv, trainY.csv, testX.csv, and testY.csvdata.tar.gz
contains trainX.csv, trainY.csv, testX.csv, and testY.csvBagOfWords.p
contains all unique words from the data to use it laterSpam detection.ipynb
Ipython notebook that train the model and call emails from ur Gmails to classify
The format of the .eml file is definde in RFC822, and information on recent standard of email, i.e., MIME (Multipurpose Internet Mail Extensions) can be find in RFC2045-2049.
In the notebook U will find how the model works , and how to authenticate ur Gmail