H. Ijazul, Q. Weidong, G. Jie, and T. Peng, “Spammy Names Detection in Pashto Language to Prevent Fake Accounts Creation on Social Media,” presented at the 8th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 2023. (EI)

Abstract

With the widespread adoption of online social networks (OSNs), the proliferation of fake accounts has become a significant concern. For malicious users, fake accounts are their preferred means to commit fraud, send spam, and abuse the system. Detecting fake profiles after account registration can also be valuable, while unlawful activities can be carried out very quickly, and it would be more beneficial to identify fakers at the early stage before the first abuse occurs. Additionally, some social media platforms are based on the assumption that users’ online information reflects their true identity. On such platforms, if a user fills his name field with random strings violating their terms of service. This paper presents a technique for detecting spammy names associated with fake profiles before any additional user information or history is available. The proposed approach involves analyzing the name field for patterns commonly found in fake accounts. To achieve this, we have developed a supervised machine learning model to discriminate between valid and spammy names. The model is trained on a labeled dataset of 100K instances (words and phrases), manually labeled for two categories, “valid names” and “spammy strings”. For classification, we examined several Machine Learning algorithms, including Naïve Bayes (NB), k-Nearest Neighbor (KNN), SVM, Linear Regression, and Decision Trees. Experimental results show that the Naïve Bayes algorithm performs the best and yields an F1-score of 94.1% with an accuracy of 95.3%.

Full Paper: https://doi.org/10.1109/ICSIP57908.2023.10270845