Word Embeddings pada Tingkat Paragraf untuk Mendeteksi Konten Persuasif Berita Native ads

Kholifah, Asiyah Nur (2025) Word Embeddings pada Tingkat Paragraf untuk Mendeteksi Konten Persuasif Berita Native ads. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025221002-Master_Thesis.pdf] Text
6025221002-Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (5MB) | Request a copy

Abstract

Berita persuasif pada media online dapat mempengaruhi sikap politik, sudut pandang, serta kepentingan finansial seseorang. Masyarakat banyak terkecoh dengan berita native ads karena memiliki struktur sebagaimana layaknya artikel berita pada umumnya. Karenanya diperlukan sebuah cara agar pembaca dapat dengan mudahnya membedakan konten persuasif pada berita umum dan berita native ads. Berdasarkan penelitian bidang sosiolinguistik, konten persuasif dapat ditemukan dengan cara memahami sintaktis dan semantik kalimat berita. Penggunaan static embedding dan contextual embeddings dapat menjadi solusi dalam mengatasi pemahaman sintaktis dan semantik kalimat berita. Konten persuasif juga diketahui terdapat pada posisi tertentu dalam sebuah artikel. Sehingga pendekatan tingkat paragraf diharapkan mampu memperjelas letak dari kalimat persuasif tersebut. Paragraf-paragraf yang telah dipisahkan dan diberi nilai karakteristik secara manual akan dilakukan praposes data teks dan hasilnya akan diubah menjadi vektor-vektor dengan word embedding. Model embedding terlatih dari GloVe dan FastText dipilih sebagai state of the art word embedding dari jenis static embedding. Sedangkan pada contextual embeddings akan dipilih model terlatih berbahasa Indonesia dari BERT dan RoBERTa. Keempat word embedding ini akan digabungkan menjadi satu untuk menghasilkan model yang diharapkan dapat menangkap informasi sintaktis dan semantik dari berita. Selanjutnya hasil penelitian ini akan dibuat sebuah model klasifikasi biner multi-label dari algoritma deep learning Bi-LSTM dan CNN. Hasil temuan menunjukkan bahwa BERT-Bi-LSTM mencapai akurasi dan presisi tertinggi sebesar 95%. Selain itu didapatkan bahwa static embedding meningkatkan kinerja saat dikombinasikan dengan model klasifikasi machine learning. Sedangkan contextual embedding menghasilkan kinerja yang sangat unggul dalam model klasifikasi deep learning.
===================================================================================================================================
The persuasive nature of news media disseminated through online channels has the potential to influence a person's political attitudes, points of view, and financial interests. Many individuals are deceived by native advertising due to its structural similarity to conventional news articles. Consequently, a method is required to enable readers to readily differentiate persuasive content from general news and native ads. According to the extant research in the domain of sociolinguistics, the comprehension of the syntax and semantics of news sentences is conducive to the identification of persuasive content. The employment of static embedding and contextual embeddings can serve as a solution to enhance comprehension of the syntax and semantics of news sentences. It has been demonstrated that persuasive content is often found in specific sections of an article. Therefore, the paragraph-level approach is expected to facilitate the identification of the location of the persuasive sentence. The paragraphs that have been separated and assigned characteristic values will undergo a preprocessing step involving text data. The subsequent conversion of the results into vectors will utilize the technique of word embedding. The trained embedding models from GloVe and FastText were selected as the state-of-the-art word embedding from the static embedding type. In the context of contextual embeddings, a trained Indonesian language model from BERT and RoBERTa will be selected. The integration of these four word embeddings is anticipated to yield a unified model capable of encapsulating both syntactic and semantic nuances inherent within news content. Moreover, the findings of this study will result in the development of a multi-label binary classification model derived from the Bi-LSTM and CNN deep learning algorithms. The findings indicate that BERT-Bi-LSTM attains a maximum accuracy and precision of 95%. Furthermore, it was determined that static embedding enhances performance when integrated with a machine learning classification model. It has been demonstrated that the utilization of contextual embedding results in a substantial enhancement of the performance of the deep learning classification model.

Item Type: Thesis (Masters)
Uncontrolled Keywords: contextual embedding, deep learning, native ads, static embedding, tingkat paragraf, paragraph-level
Subjects: P Language and Literature > P Philology. Linguistics > P325 Semantics.
Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines.
Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.9.D343 Data mining. Querying (Computer science)
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Asiyah Nur Kholifah
Date Deposited: 04 Aug 2025 07:43
Last Modified: 04 Aug 2025 07:45
URI: http://repository.its.ac.id/id/eprint/126204

Actions (login required)

View Item View Item