Spam Detection Based on BERT Model: A Descriptive Study

Hailin Xu

doi:10.47297/taposatWSP2633-456912.20230402

PDF
Export
Share
Collection
Album

Spam Detection Based on BERT Model: A Descriptive Study

Volume 4, Issue 2

Views: 0 Downloads: 2

Spam Detection Based on BERT Model: A Descriptive Study
Spam Detection Based on BERT Model: A Descriptive Study
School of Computer Science and Technology, Huazhong University of Science and Technology,Hubei,Wuhan,P.R.China,430074
Vol. 4, Issue 2, Pages: 78-84(2023)
Published： 2023 ，
DOI：10.47297/taposatWSP2633-456912.20230402
Accepted：

Scan QR Code

HAILIN XU. (2023). Spam Detection Based on BERT Model: A Descriptive Study. Theory and Practice of Science and Technology, 4(2), 78-84..
DOI：

HAILIN XU. (2023). Spam Detection Based on BERT Model: A Descriptive Study. Theory and Practice of Science and Technology, 4(2), 78-84.. DOI： 10.47297/taposatWSP2633-456912.20230402.

摘要

Abstract

This thesis proposes a spam detection method based on BERT (Bidirectional Encoder Representations from Transformers) model. Spam problem is becoming more and more serious in today's Internet environment

and effective identification and filtering of spam is crucial to protect users' information security and improve the quality of mail services.This study describes in detail the implementation steps of the BERT model-based spam detection method. First

we use a pre-trained BERT model as a feature extractor to convert email text data into a BERT representation. Then

fine-tuning (fine-tuning) is performed using the labeled spam dataset to learn decision bounds for mapping the BERT representation to spam classes by training a classifier. Finally

we evaluate the performance of the method using a test dataset and compare it with other commonly used spam detection methods.Experimental results show that the spam detection method based on the BERT model achieves significant improvements in metrics such as accuracy and recall. Compared with traditional rule-based or feature engineering methods

this method can better capture the complex semantic and contextual information in spam

thus improving the accuracy of spam detection.The spam detection method based on BERT model proposed in this study has good performance and application prospects. Future research can further explore how to combine other technical means and optimization strategies to further improve the effectiveness of spam detection and promote the wide application and diffusion of the method in practical applications.

关键词

Keywords

BERT modelSpam detectionFeature extractionFine-tuningPerformance evaluation

references

Dada, E.G., Bassi, J.S., Chiroma, H., Abdulhamid, M., & Ajibuwa. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon. pp. 1-23. [2] Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North. Minnesota. pp. 1-16.[3] Hovold, J. (2005). Naive bayes spam filtering using word-position-based attributes. In: Conference on Email and Anti-Spam. California. pp. 41-48.[4] Drucker, H., Wu, D., Vapnik, V. N. (2002). Support vector machines for spam categorization. In: IEEE Transactions on Neural Networks. New York. pp. 1048-54. [5] Shahariar, G. M., Biswas, S., Omar, F., Shah, F. M., & Hassan, S. B. (2019). Spam review detection using deep learning. In: 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). Vancouver. pp. 0027-0033. [6] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., ... & Polosukhin, I. (2017) Attention is all you need. In: Advances in Neural Information Processing Systems. Long Beach, California. pp. 30.[7] Tida, V.S., & Hsu, S.H. (2022). Universal spam detection using transfer learning of BERT model. In: Proceedings of the Annual Hawaii International Conference on System Sciences, Hawaii International Conference on System Sciences. Hawaii. pp. 1-9.[8] SMS Spam Collection Dataset. (2016). https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset

Alert me when the article has been cited

Submit

No data

Related Author

No data

Related Institution

No data

⁰