Analisis Sentimen Ulasan Film Indonesia Berbasis Semi-Supervised Learning Menggunakan Metode Random Forest

Ramadhan, Muhammad Naufal Fawwaz (2025) Analisis Sentimen Ulasan Film Indonesia Berbasis Semi-Supervised Learning Menggunakan Metode Random Forest. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211223_Undergraduate Thesis.pdf] Text
5025211223_Undergraduate Thesis.pdf
Restricted to Repository staff only

Download (3MB) | Request a copy

Abstract

Analisis sentimen pada ulasan daring telah menjadi alat penting untuk mengukur opini publik, termasuk dalam industri perfilman. Namun, tantangan utama dalam konteks Bahasa Indonesia adalah kelangkaan dataset berlabel yang memadai untuk melatih model machine learning secara efektif. Penelitian ini mengusulkan sebuah kerangka kerja untuk membangun model klasifikasi sentimen pada ulasan film Indonesia dengan memanfaatkan teknik semisupervised learning berupa pseudo-labeling untuk mengatasi keterbatasan data. Proses dimulai dengan pengumpulan komentar berjumlah 45.177 yang didapat dari platform YouTube, yang kemudian melalui tahap pra-pemrosesan. Sejumlah 750 ulasan dipilih secara acak dan dilabeli manual untuk menjadi dataset pengujian yang objektif. Sisa data tak berlabel kemudian dimanfaatkan dalam proses pseudo-labeling iteratif untuk memperluas dataset yang dilatih. Dua skenario eksperimen utama dijalankan dengan membandingkan metode ekstraksi fitur TF-IDF dan CountVectorizer pada lima algoritma klasifikasi yaitu Decision Tree, SVM, Random Forest, Naive Bayes, dan XGBoost. Berdasarkan hasil pengujian pada data uji, performa terbaik dicapai oleh model Decision Tree dengan ekstraksi fitur TF-IDF, dengan perolehan F1-Score macro sebesar 0,764. Sebagai perbandingan, kinerja terbaik pada skenario CountVectorizer juga dicapai oleh model Decision Tree dengan F1-Score 0,700. Hasil ini menunjukkan bahwa pendekatan pseudo-labeling efektif dalam meningkatkan kinerja model, dan representasi fitur berbasis TF-IDF terbukti lebih akurat untuk tugas klasifikasi pada dataset ulasan film Indonesia.
========================================================================================================================
Sentiment analysis of online reviews has become an important tool for measuring public opinion, including in the film industry. However, the main challenge in the context of Indonesian is the scarcity of adequate labeled datasets to effectively train machine learning models. This study proposes a framework for building a sentiment classification model for Indonesian film reviews by utilizing a semi-supervised learning technique called pseudolabeling to overcome data limitations. The process begins with the collection of 45,177 comments obtained from the YouTube platform, which then undergoes pre-processing. A total of 750 reviews are randomly selected and manually labeled to form an objective test dataset. The remaining unlabeled data was then used in an iterative pseudo-labeling process to expand the training dataset. Two main experimental scenarios were conducted by comparing the TFIDF and CountVectorizer feature extraction methods on five classification algorithms: Decision Tree, SVM, Random Forest, Naive Bayes, and XGBoost. Based on the test results, the best performance was achieved by the Decision Tree model with TF-IDF feature extraction, with a macro F1-Score of 0,764. In comparison, the best performance in the CountVectorizer scenario was also achieved by the Decision Tree model with F1-Score of 0,700. These results indicate that the pseudo-labeling approach is effective in improving model performance, and TF-IDF based feature representation is proven to be more accurate for classification tasks on Indonesian movie review datasets.

Item Type: Thesis (Other)
Uncontrolled Keywords: Analisis Sentimen, Pseudo-Labeling, Ulasan Film, Machine Learning, TFIDF, CountVectorizer, Sentiment Analysis, Pseudo-Labeling, Movie Reviews, Machine Learning, TFIDF, CountVectorizer.
Subjects: T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Muhammad Naufal Fawwaz Ramadhan
Date Deposited: 22 Jan 2026 02:17
Last Modified: 22 Jan 2026 02:17
URI: http://repository.its.ac.id/id/eprint/130018

Actions (login required)

View Item View Item