Klasifikasi Multi-label Teks Tweet berdasarkan Aspek Dangerous Speech

Hadid, Aldo Fernanda (2023) Klasifikasi Multi-label Teks Tweet berdasarkan Aspek Dangerous Speech. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 04111940000074-Undergraduate-Thesis.pdf] Text
04111940000074-Undergraduate-Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 September 2025.

Download (6MB) | Request a copy

Abstract

Komunikasi media sosial telah menjadi bagian penting dari aktivitas sehari-hari di masyarakat modern. Untuk alasan ini, memastikan keamanan di platform media sosial adalah suatu keharusan. Penggunaan bahasa berbahaya seperti ancaman fisik di lingkungan online masih sedikit, namun tetap sangat penting. Meskipun beberapa penelitian telah dilakukan pada masalah terkait pendeteksian bahasa yang menyinggung dan penuh kebencian, aspek Dangerous Speech belum pernah diperlakukan secara signifikan. Dangerous Speech terdiri dari konteks dan pesan. Dengan demikian, dengan memahami konteks dan pesan dalam sebuah teks, kita dapat mengidentifikasi Dangerous Speech dan bukan hanya Hate Speech. Aspek dari Dangerous Speech adalah konteks sosial, konteks sejarah, dehumanisasi, tuduhan di cermin, serangan perempuan dan anak, kesetiaan pada kelompok, dan ancaman kelompok. Tahapan dalam penelitian ini mencakup pelabelan dataset, pengolahan data, ekstraksi fitur, klasifikasi dan evaluasi. Dataset yang digunakan berisi tweet yang diambil dari Twitter sesuai dengan daftar kata yang tersedia di www.hatebase.org. Dataset tersebut kemudian akan dilakukan labeling manual berdasarkan ketujuh aspek Dangerous Speech sesuai dengan penelitian oleh (Benesch et al., 2018). Setelah pelabelan, dataset akan dilakukan praproses seperti case folding, stopword removal, stemming, dan lain-lain. Tahapan selanjutnya adalah melakukan ekstraksi fitur menggunakan TF-IDF. Selanjutnya akan dilakukan klasifikasi menggunakan Random Forest, KNN, Decision Tree, Naïve Bayes, Support Vector Machine. Kemudian akan dilakukan evaluasi dari metode klasifikasi yang telah dibuat. Dari sistem yang telah dibuat, diperoleh hasil klasifikasi terbaik oleh Support Vector Machine dengan akurasi sebesar 73,2%. Selain itu, setelah dilakukan penghitungan dengan Weighted Sum Model, diperoleh hasil kategori Dangerous Speech sebanyak 1322 atau 7,25% dengan kombinasi label didominasi oleh aspek Konteks Sosial, Dehumanisasi dan Tuduhan di Cermin.
======================================================================================================================================
Social media communication has become an important part of daily activities in modern society. For this reason, ensuring security on social media platforms is a must. The use of malicious language such as physical threats in online environments is rare, but still very important. Although several studies have been conducted on the problem related to the detection of cruel and hateful language, the Dangerous Speech aspect has never been treated in a significant way. Dangerous Speech consists of context and message. Therefore, by understanding the context and message in a text, we can identify Dangerous Speech and not just Hate Speech. Aspects of Dangerous Speech are social context, historical context, dehumanization, accusations in the mirror, attacks on women and children, loyalty to groups, and threat to group integrity. The stages in this study include dataset labeling, data processing, feature extraction, classification and evaluation. The dataset used comes from previous research by (Pramana, 2022) which contains tweets taken from Twitter according to the word list available at www.hatebase.org. The dataset will then be labeled manually based on the seven aspects of Dangerous Speech according to research by (Benesch et al., 2018). After labeling, the dataset will be pre-processed such as case folding, stopword removal, stemming, and others. The next step is to perform feature extraction using TF-IDF. Next, classification will be carried out using Random Forest, KNN, Decision Tree, Naïve Bayes, Support Vector Machine. Then an evaluation of the classification method that has been made will be carried out. From the system that has been created, the best classification results are obtained by the Support Vector Machine with an accuracy of 73.2%. In addition, after calculating with the Weighted Sum Model, the results for the Dangerous Speech category were 1322 or 7.25% with a combination of labels dominated by aspects of Social Context, Dehumanization and Accusation in the Mirror.

Item Type: Thesis (Other)
Uncontrolled Keywords: Dangerous Speech, Klasifikasi, Twitter, Weighted Sum Model, Classification, Dangerous Speech, Twitter, Weighted Sum Model
Subjects: H Social Sciences > HD Industries. Land use. Labor > HD108 Classification (Theory. Method. Relation to other subjects )
Q Science > Q Science (General) > Q325.5 Machine learning.
Divisions: Faculty of Information and Communication Technology > Informatics > 55201-(S1) Undergraduate Thesis
Depositing User: Aldo Fernanda Hadid
Date Deposited: 15 Aug 2023 08:05
Last Modified: 15 Aug 2023 08:05
URI: http://repository.its.ac.id/id/eprint/103419

Actions (login required)

View Item View Item