Penerapan Reinforcement Learning Dengan Arsitektur A3C(Asynchronous Advantage Actor-Critic) Untuk Mengatasi Masalah Warm Start Dalam Sistem Rekomendasi

Salsabila, Shyfa (2025) Penerapan Reinforcement Learning Dengan Arsitektur A3C(Asynchronous Advantage Actor-Critic) Untuk Mengatasi Masalah Warm Start Dalam Sistem Rekomendasi. Diploma thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 2040211110-Undergraduate_Thesis.pdf]

Text
2040211110-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only
Download (4MB) | Request a copy

Abstract

Masalah warm-start dalam sistem rekomendasi terjadi ketika data interaksi awal pengguna atau item terbatas, sehingga sistem sulit menghasilkan rekomendasi yang akurat dan personal. Penelitian ini bertujuan mengembangkan algoritma Asynchronous Advantage Actor-Critic (A3C) untuk menaikkan akurasi sebesar 2-5% dibandingkan penelitian sebelumnya, dengan mempercepat pelatihan model kurang dari 3 menit dan menghasilkan variansi gradien yang kecil dalam sistem rekomendasi berbasis Reinforcement Learning. Pendekatan A3C memanfaatkan pelatihan paralel pada beberapa lingkungan, memungkinkan pembaruan parameter jaringan secara asinkron dan terarah. Dengan metode ini, model dapat mempertahankan kualitas pembelajaran meskipun menghadapi perubahan preferensi pengguna secara signifikan atau menghadapi data interaksi yang tidak merata. Pengujian dilakukan menggunakan dataset MovieLens 100K dan 1M, dengan evaluasi menggunakan metrik Hit Rate dan Normalized Discounted Cumulative Gain (NDCG). Hasil pengujian menunjukkan bahwa model A3C berhasil mencapai akurasi rekomendasi sebesar 56,43% pada skenario warm-start dan 64% pada skenario cold-start untuk MovieLens 100K, serta 38,06% pada skenario warm-start untuk MovieLens 1M. Selain itu, model menunjukkan variansi gradien yang rendah pada rentang P10%–30%, menandakan stabilitas performa selama proses pelatihan. Waktu pelatihan yang relatif singkat, yaitu rata-rata 1,75 menit per epoch, membuktikan kemampuan adaptif A3C dalam menyesuaikan rekomendasi terhadap dinamika perubahan preferensi pengguna. Dengan hasil tersebut, A3C terbukti efektif dalam meningkatkan akurasi dan adaptivitas sistem rekomendasi pada fase awal interaksi.
=================================================================================================================================
The warm-start problem in recommender systems occurs when initial user or item interaction data is limited, making it difficult for the system to generate accurate and personalized recommendations. This study aims to develop the Asynchronous Advantage Actor-Critic (A3C) algorithm to improve accuracy by 2–5% compared to previous research, accelerate model training to under 3 minutes, and reduce gradient variance in a reinforcement learning-based recommender system. The A3C approach leverages parallel training across multiple environments, enabling asynchronous and directed parameter updates of the network. Through this method, the model can maintain learning quality even when facing significant changes in user preferences or uneven interaction data. Testing was conducted using the MovieLens 100K and 1M datasets, with evaluation based on Hit Rate and Normalized Discounted Cumulative Gain (NDCG) metrics. The results showed that the A3C model achieved a recommendation accuracy of 56.43% in the warm-start scenario and 64% in the cold-start scenario for MovieLens 100K, as well as 38.06% in the warm-start scenario for MovieLens 1M. Furthermore, the model demonstrated low gradient variance in the range of 10%–30%, indicating performance stability during training. The relatively short training time averaging 1.75 minutes per-epoch proves the adaptive capability of A3C in adjusting recommendations to the dynamic shifts in user preferences. These results confirm that A3C is effective in enhancing both the accuracy and adaptability of recommender systems during the initial stages of user interaction.

Item Type:	Thesis (Diploma)
Uncontrolled Keywords:	A3C, Reinforcement Learning, Sistem Rekomendasi, Recommender Systems, Warm-Start.
Subjects:	T Technology > T Technology (General) T Technology > T Technology (General) > T57.5 Data Processing T Technology > T Technology (General) > T57.6 Operations research--Mathematics. Goal programming T Technology > T Technology (General) > T57.62 Simulation T Technology > T Technology (General) > T57.8 Nonlinear programming. Support vector machine. Wavelets. Hidden Markov models. T Technology > T Technology (General) > T57.83 Dynamic programming T Technology > T Technology (General) > T57.84 Heuristic algorithms. T Technology > T Technology (General) > T58.5 Information technology. IT--Auditing T Technology > T Technology (General) > T58.62 Decision support systems T Technology > T Technology (General) > T58.8 Productivity. Efficiency
Divisions:	Faculty of Vocational > 36304-Automation Electronic Engineering
Depositing User:	Shyfa Salsabila
Date Deposited:	01 Aug 2025 02:48
Last Modified:	01 Aug 2025 02:48
URI:	http://repository.its.ac.id/id/eprint/124112

Actions (login required)

View Item