Prediksi Curah Hujan Bulanan di Kota Tanjungpinang Menggunakan Stacking Ensemble Learning Berbasis Data Iklim Multivariat

Zulfa, Ahmad (2026) Prediksi Curah Hujan Bulanan di Kota Tanjungpinang Menggunakan Stacking Ensemble Learning Berbasis Data Iklim Multivariat. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025232005-Master_Thesis.pdf] Text
6025232005-Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (4MB) | Request a copy

Abstract

Prediksi curah hujan di Kota Tanjungpinang menghadapi tantangan kompleks akibat karakteristik wilayah Non Zona Musim (Non-ZOM), di mana batas antara musim hujan dan kemarau tidak terdefinisi dengan jelas, sehingga pola presipitasi menjadi sangat stokastik. Penelitian ini bertujuan mengembangkan model prediksi yang robust menggunakan arsitektur Stacking Ensemble Learning dengan memanfaatkan data iklim multivariat periode 1991–2024. Metodologi penelitian menerapkan rekayasa fitur komprehensif yang mencakup agregasi rata-rata dan standar deviasi bulanan untuk menangkap volatilitas iklim, integrasi indeks global (SOI dan IOD), serta transformasi log1p untuk menormalisasi distribusi data curah hujan yang sangat condong ke kanan (right-skewed). Kinerja model dievaluasi menggunakan data uji tahun 2024 (unseen data). Hasil eksperimen menunjukkan bahwa model Stacking Ensemble yang menggabungkan Random Forest, CatBoost, dan Gradient Boosting sebagai Base-Learner dengan Linear Regression sebagai Meta-Learner mencapai akurasi superior dengan skor R² 0,891, RMSE 56,86 mm, dan MAPE 24,50%. Kinerja ini melampaui model tunggal terbaik (Random Forest dan CatBoost) yang tertahan pada skor R² ~0,77. Temuan krusial dari studi ablasi mengungkap adanya fenomena penting terkait seleksi fitur. Meskipun penerapan Feature Selection (FS) memberikan skor awal yang sedikit lebih tinggi pada pemodelan dasar (0,78) dibandingkan tanpa vii seleksi (0,77), penggunaan data lengkap tanpa seleksi fitur (No Feature Selection) justru menjadi kunci utama bagi arsitektur ensemble untuk mencapai performa puncak (0,89). Hal ini membuktikan bahwa fitur-fitur yang tereliminasi dalam proses seleksi konvensional mengandung informasi laten yang mampu diekstraksi oleh model stacking berbasis pohon untuk meningkatkan generalisasi. Analisis kepentingan fitur menempatkan Relative Humidity siang hari (RH13) dan stabilitas suhu (T13_std) sebagai prediktor paling dominan, mengindikasikan bahwa dinamika iklim lokal memiliki pengaruh yang jauh lebih kuat dibandingkan indeks iklim global pada skala bulanan di wilayah studi
====================================================================================================================================
Rainfall prediction in Tanjungpinang City presents a complex challenge due to the characteristics of the Non-Seasonal Zone (Non-ZOM), where the boundaries between wet and dry seasons are not clearly defined, resulting in highly stochastic precipitation patterns. This research aims to develop a robust prediction model using a Stacking Ensemble Learning architecture utilizing multivariate climate data from the period 1991–2024. The research methodology applies comprehensive feature engineering, which includes the aggregation of monthly means and standard deviations to capture climate volatility, the integration of global indices (SOI and IOD), and a log1p transformation to normalize the highly right-skewed distribution of rainfall data. Model performance was evaluated using test data from the year 2024 (unseen data). Experimental results demonstrate that the Stacking Ensemble model combining Random Forest, CatBoost, and Gradient Boosting as Base-Learners with Linear Regression as the Meta-Learner achieved superior accuracy with an R² score of 0,891, RMSE of 56,86 mm, and MAPE of 24,50%. This performance significantly outperforms the best single models (Random Forest and CatBoost), which were limited to an R² score of approximately 0,77. Crucial findings from the ablation study reveal an important phenomenon regarding feature selection. Although the application of Feature Selection (FS) yielded a slightly higher initial score in baseline modelling (0,78) compared to no selection (0,77), the use of complete data without feature selection (No Feature Selection) proved to be the key factor for the ensemble architecture to achieve peak performance (0,89). This proves that features eliminated in conventional selection processes contain latent information capable of being extracted by tree-based stacking models to improve generalization. Feature importance analysis identifies daytime Relative Humidity (RH13) and temperature stability (T13_std) as the most dominant predictors, indicating that local climate dynamics have a much stronger influence than global climate indices on a monthly scale in the study area.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Kata Kunci: Data Iklim Multivariat, Non Zona Musim (Non-ZOM), Prediksi Curah Hujan, Stacking Ensemble Learning, Studi Ablasi. Keywords: Ablation Study, Multivariate Climate Data, Non-Seasonal Zone (Non-ZOM), Rainfall Prediction, Stacking Ensemble Learning
Subjects: Q Science
Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines.
Q Science > QA Mathematics
Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.F56 Data structures (Computer science)
T Technology > T Technology (General)
T Technology > T Technology (General) > T174 Technological forecasting
T Technology > T Technology (General) > T57.5 Data Processing
T Technology > TD Environmental technology. Sanitary engineering > TD171.75 Climate change mitigation
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Ahmad Zulfa
Date Deposited: 28 Jan 2026 04:40
Last Modified: 28 Jan 2026 04:40
URI: http://repository.its.ac.id/id/eprint/130670

Actions (login required)

View Item View Item