Optimizing Cyberbullying Detection in Tweets with an Ensemble Learning Framework

Main Article Content

Manjeet Singh, Annapurna Metta, Satyendra Patnaik

Abstract

The fast spread of social media has resulted in a considerable increase in cyberbullying events, which can have serious psychological consequences for individuals. Detecting and reducing cyberbullying has become a critical job for creating a safer online environment. Traditional approaches for detecting cyberbullying frequently rely on manual moderation, which is inefficient considering the volume of content published daily. As a result, there is an urgent need for automated techniques to properly classifying and managing cyberbullying content. The study focuses on classifying cyberbullying types in tweets using a variety of machine-learning models and an improved stacking ensemble model. After preprocessing the dataset, the text data is cleaned by eliminating URLs, mentions, hashtags, and stop words. The preprocessed text input is then transformed into TF-IDF features to train the models. The tests used a variety of classifiers, including Naive Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, k-Nearest Neighbors (k-NN), and Gradient Boosting. Each model is trained and assessed to compare its performance in terms of accuracy, precision, recall, and F1 score. To improve classification performance even more, we propose a stacking ensemble model that combines the predictions of these base classifiers with a logistic regression meta-classifier. The ensemble model's hyperparameters are optimized using grid search. The ensemble model performs best, with a classification accuracy of 90.18%.

Article Details

Section
Articles