Ashar, Muhammad Nasry (2025) Augmentasi Teks Berbasis Prefiks dan Ensemble Learning untuk Mengatasi Keterbatasan Data dalam Analisis Sentimen. Masters thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
6025221004-Master_Thesis.pdf - Accepted Version Restricted to Repository staff only until 1 April 2027. Download (2MB) | Request a copy |
Abstract
Analisis sentiment dalam Bahasa Indonesia sering menghadapi tantangan keterbatasan data, yang menyebabkan risiko data menjadi bias, overfitting, atau underfitting model machine learning. Untuk mengatasi masalah tersebut, penelitian ini mengusulkan pendekatan berbasis ensemble learning yang mengintegrasikan Logistic Regression (LR), Naïve Bayes (NB), dan Support Vector Machine (SVM) dengan menggunakan teknik hard voting. Selain itu, augmentasi teks berbasis prefiks “me-“, “ter-“, “di-“, dan “ber-“ diterapkan untuk meningkatkan variasi dan kuantitas data yang dirancang untuk meningkatkan kemampuan generalisasi dan akurasi dalam analisis sentimen untuk data berbahasa Indonesia yang terbatas. Penelitian ini bertujuan untuk meningkatkan akurasi model ensemble learning untuk sentimen analisis. Augmentasi teks berbasis prefiks dilakukan untuk meningkatkan jumlah dataset dengan cara menambahkan imbuhan untuk kata kerja yang telah diidentifikasi melalui POS Tagging. Cara ini dapat menghasilkan dataset yang lebih variatif tanpa mengubah makna kalimat asli. Ensemble learning yang digunakan berbasis stacking untuk mengkombinasikan kekuatan masing-masing model kemudian menggunakan metode hard voting dalam pembobotan dengan tujuan hasil prediksi akhir ditentukan berdasarkan mayoritas suara. Hasil pengujian menunjukkan bahwa augmentasi teks berbasis prefiks dan model ensemble learning mampu meningkatkan akurasi model sebesar 91,29%. Dibandingkan dengan model yang diuji tanpa data augmentasi. Selain itu, metode ensemble learning terbukti lebih unggul dibandingkan model individual LR, NB, dan SVM.
==================================================================================================================================
Sentiment analysis in Indonesian often faces the challenge of limited data, which causes the risk of data bias, overfitting, or underfitting machine learning models. To overcome this problem, this study proposes an ensemble learning-based approach that integrates Logistic Regression (LR), Naïve Bayes (NB), and Support Vector Machine (SVM) using the hard voting technique. In addition, text augmentation based on the prefixes "me-", "ter-", "di-", and "ber-" is applied to increase the variety and quantity of data designed to improve generalization and accuracy in sentiment analysis for limited Indonesian language data.
This study aims to improve the accuracy of the ensemble learning model for sentiment analysis. Prefix-based text augmentation is carried out to increase the number of datasets by adding affixes to verbs that have been identified through POS Tagging. This method can produce a more varied dataset without changing the meaning of the original sentence. The ensemble learning used is stacking-based to combine the strengths of each model and then uses the hard voting method in weighting with the aim that the final prediction results are determined based on the majority of votes. The test results show that prefix-based text augmentation and ensemble learning models are able to increase model accuracy by 91.29%. Compared to the model tested without augmentation data. In addition, the ensemble learning method is proven to be better than the individual LR, NB, and SVM models.
Item Type: | Thesis (Masters) |
---|---|
Uncontrolled Keywords: | Analisis Sentimen, Keterbatasan Data, Augmentasi Teks, Ensemble Learning, NLP, Sentiment Analysis, Data Limitations, Text Augmentation, Ensemble Learning |
Subjects: | T Technology > T Technology (General) > T57.5 Data Processing |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis |
Depositing User: | Muhammad Nasry Ashar |
Date Deposited: | 03 Feb 2025 02:27 |
Last Modified: | 03 Feb 2025 02:27 |
URI: | http://repository.its.ac.id/id/eprint/117717 |
Actions (login required)
![]() |
View Item |