Random Forests Spam Email Classification System
Email is a rapid communication tool and cost-effective for the users. Conversely, the number of email users driven to the intense growth of spam mails in the past few eras. This spam mails issue is one of the substantial risks with the internet. The growing amount of spam mails brings the importance of trustworthy anti-spam filters. Usually the spammers send the undesired and unsolicited emails to various recipients and these spam mails are mostly identical in its characteristics. Hence it is essential to frame a defense system that effectively finds the spam mails and provide an alternate process for stand-alone filter. Thus, in this paper a novel framework for classifying the email into spam and ham mails using the attribute based random forests classification is proposed. The process begins Bayesian spamminess probability calculation for each token, TF-IDF weighting scheme calculates the weight for each token and the mail, score calculation is performed based on the genetic fitness and finally the classification process is done using random forests classifier to classify the emails into spam and ham emails The results are compared with existing spam classification methods in terms of classification accuracy, weighted accuracy and F1 measure. The results show that the proposed system shows the promising results when compared with other existing algorithms.