Volume 4, Issue 2
Spam Detection Based on BERT Model: A Descriptive Study
- Vol. 4, Issue 2, Pages: 78-84(2023)
Published: 2023
DOI:10.47297/taposatWSP2633-456912.20230402
Full txt
Volume 4, Issue 2
School of Computer Science and Technology, Huazhong University of Science and Technology,Hubei,Wuhan,P.R.China,430074
Published: 2023 ,
Full txt
Hailin Xu. (2023). Spam Detection Based on BERT Model: A Descriptive Study. Theory and Practice of Science and Technology, 4(2), 78-84.
Hailin Xu. (2023). Spam Detection Based on BERT Model: A Descriptive Study. Theory and Practice of Science and Technology, 4(2), 78-84. DOI: 10.47297/taposatWSP2633-456912.20230402.
This thesis proposes a spam detection method based on BERT (Bidirectional Encoder Representations from Transformers) model. Spam problem is becoming more and more serious in today's Internet environment
and effective identification and filtering of spam is crucial to protect users' information security and improve the quality of mail services.This study describes in detail the implementation steps of the BERT model-based spam detection method. First
we use a pre-trained BERT model as a feature extractor to convert email text data into a BERT representation. Then
fine-tuning (fine-tuning) is performed using the labeled spam dataset to learn decision bounds for mapping the BERT representation to spam classes by training a classifier. Finally
we evaluate the performance of the method using a test dataset and compare it with other commonly used spam detection methods.Experimental results show that the spam detection method based on the BERT model achieves significant improvements in metrics such as accuracy and recall. Compared with traditional rule-based or feature engineering methods
this method can better capture the complex semantic and contextual information in spam
thus improving the accuracy of spam detection.The spam detection method based on BERT model proposed in this study has good performance and application prospects. Future research can further explore how to combine other technical means and optimization strategies to further improve the effectiveness of spam detection and promote the wide application and diffusion of the method in practical applications.
BERT modelSpam detectionFeature extractionFine-tuningPerformance evaluation
Dada, E.G., Bassi, J.S., Chiroma, H., Abdulhamid, M., & Ajibuwa. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon. pp. 1-23. [2] Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North. Minnesota. pp. 1-16.[3] Hovold, J. (2005). Naive bayes spam filtering using word-position-based attributes. In: Conference on Email and Anti-Spam. California. pp. 41-48.[4] Drucker, H., Wu, D., Vapnik, V. N. (2002). Support vector machines for spam categorization. In: IEEE Transactions on Neural Networks. New York. pp. 1048-54. [5] Shahariar, G. M., Biswas, S., Omar, F., Shah, F. M., & Hassan, S. B. (2019). Spam review detection using deep learning. In: 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). Vancouver. pp. 0027-0033. [6] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., ... & Polosukhin, I. (2017) Attention is all you need. In: Advances in Neural Information Processing Systems. Long Beach, California. pp. 30.[7] Tida, V.S., & Hsu, S.H. (2022). Universal spam detection using transfer learning of BERT model. In: Proceedings of the Annual Hawaii International Conference on System Sciences, Hawaii International Conference on System Sciences. Hawaii. pp. 1-9.[8] SMS Spam Collection Dataset. (2016). https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
Related Articles
Related Author
Related Institution