Deteksi Kebohongan Berbasis Teks Menggunakan Kombinasi Model Bert Dan CNN

Dewi, Mashita (2025) Deteksi Kebohongan Berbasis Teks Menggunakan Kombinasi Model Bert Dan CNN. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211036-Undergraduate_Thesis.pdf] Text
5025211036-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (6MB) | Request a copy

Abstract

Kejujuran dalam pernyataan tertulis menjadi aspek penting dalam berbagai konteks seperti komunikasi daring, opini publik, ulasan produk, dan interaksi digital lainnya. Namun, tidak semua informasi yang disampaikan dapat dijamin kebenarannya, sehingga dibutuhkan metode yang andal untuk mendeteksi kebohongan secara otomatis. Penelitian ini mengusulkan metode deteksi kebohongan berbasis teks dengan memadukan transfer learning menggunakan BERT dan arsitektur konvolusional multi kernel. Karena seluruh dataset berbahasa Inggris, korpus diterjemahkan ke Bahasa Indonesia menggunakan DeepL untuk memungkinkan evaluasi lintas bahasa dengan kosakata seragam. Eksperimen dilakukan pada tiga dataset open source yaitu, Fake News Dataset, Open Domain Deception Dataset, dan Deceptive Opinion Spam Dataset. Empat varian BERT digunakan sebagai ekstraktor fitur semantik berdimensi 768 dan 1024, yang selanjutnya diproses oleh arsitektur CNN multi kernel dengan kombinasi ukuran kernel ([2, 3, 4], [3, 4, 5], [4, 5, 6]) serta jumlah filter (100 atau 150 per kernel). Beragam strategi fine-tuning diterapkan, mulai dari pembekuan penuh hingga pembukaan seluruh lapisan encoder. Hasil konvolusi dinormalisasi melalui batch normalization, dilanjutkan dengan dropout dan proyeksi ke lapisan fully connected untuk menghasilkan satu logit antara “lie” atau “truth”. Optimasi dilakukan menggunakan AdamW dengan linear warmup, batch size 16, dan early stopping berbasis F1-Score validasi. Evaluasi dilakukan secara in dataset dan multi source. Hasil menunjukkan bahwa pendekatan BERT dan CNN multi kernel secara konsisten menghasilkan kinerja tinggi. Konfigurasi terbaik diperoleh pada Deceptive Opinion Spam Dataset menggunakan BERT Base Uncased dengan pembukaan seluruh lapisan encoder, kernel [2, 3, 4], dan 100 filter, menghasilkan F1-Score sebesar 0.9154. Hal ini menunjukkan potensi besar pendekatan ini dalam sistem deteksi kebohongan berbasis teks.
======================================================================================================================================
Honesty in written statements plays a crucial role in various contexts, such as online communication, public opinion, product reviews, and other forms of digital interaction. However, not all conveyed information can be guaranteed as truthful, necessitating reliable methods for automatic lie detection. This study proposes a text-based lie detection method by integrating transfer learning using BERT and a multi-kernel convolutional architecture. Since the entire dataset is in English, the corpus was translated into Indonesian using DeepL to enable cross-lingual evaluation with a standardized vocabulary. Experiments were conducted on three open-source datasets: the Fake News Dataset, the Open Domain Deception Dataset, and the Deceptive Opinion Spam Dataset. Four BERT variants were used as semantic feature extractors with 768 and 1024-dimensional outputs, which were subsequently processed by a multi-kernel CNN architecture. The CNN used combinations of kernel sizes ([2, 3, 4], [3, 4, 5], [4, 5, 6]) and numbers of filters (100 or 150 per kernel). Various fine-tuning strategies were applied, ranging from full freezing to unfreezing all encoder layers. The convolution outputs were normalized using batch normalization, followed by dropout and projection to a fully connected layer that outputs a single logit representing either “lie” or “truth.” Optimization was carried out using AdamW with linear warmup, a batch size of 16, and early stopping based on validation F1-Score. Evaluation was performed both within individual datasets and across multiple sources. The results show that the combination of BERT and multi-kernel CNN consistently delivers high performance. The best configuration was achieved on the Deceptive Opinion Spam Dataset using BERT Base Uncased with all encoder layers unfrozen, kernel sizes [2, 3, 4], and 100 filters, yielding an F1-Score of 0.9154. This highlights the strong potential of this approach for text-based lie detection systems.

Item Type: Thesis (Other)
Uncontrolled Keywords: Deteksi Kebohongan, Klasifikasi Teks, BERT, CNN, Deception Detection, Text Classification, BERT, CNN
Subjects: T Technology > T Technology (General)
T Technology > TK Electrical engineering. Electronics Nuclear engineering
T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7882.P3 Pattern recognition systems
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering
Depositing User: Mashita Dewi
Date Deposited: 13 Jul 2025 06:33
Last Modified: 13 Jul 2025 06:33
URI: http://repository.its.ac.id/id/eprint/119520

Actions (login required)

View Item View Item