Deteksi Cyberbullying dan Analisis Sentimen Berdasarkan Aspek Cyberbullying Menggunakan Machine Learning, Topic Modelling, dan Majority Voting

Salsabila, Salsabila (2024) Deteksi Cyberbullying dan Analisis Sentimen Berdasarkan Aspek Cyberbullying Menggunakan Machine Learning, Topic Modelling, dan Majority Voting. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025221026-Master_Thesis.pdf] Text
6025221026-Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2026.

Download (2MB) | Request a copy

Abstract

Kehadiran media sosial dan kebebasan dalam menggunakan teknologi meningkatkan risiko cyberbullying. Cyberbullying dapat berupa rasis, seksis, dan body shaming yang dapat mengganggu kesehatan mental, bahkan bunuh diri. Oleh karena itu, penelitian ini bertujuan untuk mendeteksi dan menganalisis cyberbullying yang terjadi di media sosial. Penelitian ini mengklasifikasikan data kedalam kelas cyberbullying dan not cyberbullying menggunakan metode machine learning, yaitu BiLSTM (Bidirectional Long Short-Term Memory), BERT (Bidirectional Encoder Representations from Transformers), dan RF (Random Forest), untuk mendeteksi cyberbullying. Ditemukan bahwa BERT mencapai akurasi sebesar 94%, dan f1 score 89% sehingga lebih unggul dalam menangani ketidakseimbangan data. Penelitian ini juga mengkategorisasikan data cyberbullying kedalam empat aspek, yaitu age, religion, gender, dan ethnicity menggunakan cosine similarity antara topik tersembunyi dari topic modelling yaitu NMF (Non-Negative Matrix Factorization), LDA (Latent Dirichlet Allocation) dengan kata kunci kategori aspek dan TF-IDF (Term Frequency– Inverse Document Frequency), serta Borda Ranking hasil ekstraksi kata kunci Yake, BERT dan TF-IDF. Hasilnya menunjukkan bahwa penambahan kata kunci menggunakan TF-IDF dan borda rangking hasil ekstraksi kata dapat meningkatkan akurasi hingga 80%. Sentimen analisis menggunakan Majority Voting, K-Means Clustering, dan BERT menghasilkan akurasi 83%, dengan label sentimen positif, negatif, sangat positif, dan sangat negatif.
=================================================================================================================================
The presence of social media and freedom in using technology increases the risk of cyberbullying. Cyberbullying can manifest as racism, sexism, and body shaming, which can disrupt mental health and even lead to suicide. Therefore, this research aims to detect and analyze cyberbullying occurring on social media. The study classifies data into cyberbullying and not cyberbullying classes using machine learning methods, namely BiLSTM (Bidirectional Long Short-Term Memory), BERT (Bidirectional Encoder Representations from Transformers), and RF (Random Forest), to detect cyberbullying. It was found that BERT achieved an accuracy of 94%, and an f1 score of 89%, making it superior in handling data imbalance. This research also categorizes cyberbullying data into four aspects: age, religion, gender, and ethnicity using cosine similarity between hidden topics from topic modeling, namely NMF (Non-Negative Matrix Factorization), LDA (Latent Dirichlet Allocation), with aspect category keywords, and TF-IDF (Term Frequency–Inverse Document Frequency). Additionally, Borda Ranking was applied to keyword extraction using Yake, BERT, and TF-IDF. The results show that adding keywords using TF-IDF and Borda Ranking for keyword extraction can improve accuracy up to 80%. Sentiment analysis using Majority Voting, K- Means Clustering, and BERT resulted in an accuracy of 83%, with sentiment labels being positive, negative, very positive, and very negative.

Item Type: Thesis (Masters)
Uncontrolled Keywords: analisis sentimen, cyberbullying, machine learning, majority voting, topic modelling, sentiment analysis
Subjects: B Philosophy. Psychology. Religion > BF Psychology
Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines.
Q Science > QA Mathematics > QA278.55 Cluster analysis
R Medicine > R Medicine (General) > R858 Deep Learning
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Salsabila Salsabila
Date Deposited: 01 Feb 2024 05:17
Last Modified: 01 Feb 2024 05:17
URI: http://repository.its.ac.id/id/eprint/105887

Actions (login required)

View Item View Item