Prediksi Produksi Dan Harga Tanaman Komoditas Strategis Menggunakan Model Ensemble Learning

Ridha, Zulaikha Ulhaq (2026) Prediksi Produksi Dan Harga Tanaman Komoditas Strategis Menggunakan Model Ensemble Learning. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025232029-Master_Thesis.pdf] Text
6025232029-Master_Thesis.pdf
Restricted to Repository staff only

Download (12MB)

Abstract

Ketidakstabilan harga jual konsumen komoditas strategis sebagai kebutuhan dasar masyarakat Indonesia dipengaruhi kondisi cuaca yang dapat menurunkan jumlah produksi. Terdapat penelitian tentang pemodelan prediksi jumlah produksi berbagai jenis tanaman, tetapi belum mempertimbangkan keterkaitan faktor lingkungan seperti luas lahan dengan harga jual di pasar. Hubungan antar data faktor lingkungan setiap jenis tanaman direpresentasikan sebagai variabel yang non-linear. Penelitian ini membuat model prediksi jumlah produksi dan harga komoditas strategis menggunakan pendekatan Ensemble Learning yang mampu menangani antarvariabel non-linear. Data yang digunakan mencakup variabel agroklimatologi (suhu, curah hujan, kelembapan), luas lahan, jumlah produksi, jumlah konsumsi, jumlah penduduk, dan harga konsumen pada seluruh provinsi di Indonesia periode 2018–2023 untuk komoditas beras, bawang merah, bawang putih, cabai merah dan cabai rawit. Pra-pemrosesan dilakukan di setiap kolom dengan pendekatan yang berbeda. Dikarenakan ada kemungkinan luas lahan, jumlah produksi atau semacamnya tidak dimiliki di setiap daerah untuk suatu jenis tanaman, maka imputasi KNN diterapkan pada kolom tersebut. Namun jika nilai-nilai tersebut memiliki varian yang cukup besar, maka diterapkan metode pra-pemrosesan winsorization untuk mengatasi data outlier. Selain itu metode one-hot encoding juga digunakan pada data kategorikal seperti jenis tanaman dan provinsi. Untuk meningkatkan performa model dilakukan Hiperparameter menggunakan GridSearchCV sedangakan untuk mendapatkan fitur penting pada model menggukan SHAP yang kemudian digunakan pada model ensemble (XGBoost, Gradient Boosting, Random Forest, Stacking, dan Voting). Sebagai pembanding dilakukan pengujian dengan Linear Regression, Ridge Regression, SVR, dan ANN yang membutuhkan pra-pemrosesan berbeda dengan metode Min–Max scaling. Hasil evaluasi menunjukkan XGBoost memiliki performa terbaik dalam prediksi produksi pada skenario penggunaan seluruh fitur karena mampu menangani hubungan non-linear dan variasi besar antarwilayah melalui pendekatan boosting yang dilatih secara bertahap. Pada prediksi harga, ANN menunjukkan kinerja terbaik menggunakan seluruh fitur, sedangkan seleksi fitur SHAP (>0,005) meningkatkan performa Random Forest karena model mampu menekan pengaruh noise dan menghasilkan prediksi yang lebih baik. Secara keseluruhan, model cenderung lebih mampu memprediksi harga dibandingkan produksi karena data harga memiliki variasi yang lebih rendah dan sebaran yang lebih seragam. Sementara itu, model stacking menunjukkan kinerja terendah karena meta-learner tidak berhasil menggabungkan output model dasar sehingga tidak memberikan peningkatan performa model.
==============================================================================================================================
The instability of consumer selling prices for strategic commodities, a basic need for the Indonesian people, is influenced by weather conditions, which can reduce production. Research on predictive modeling of production volumes for various crops has existed, but none has considered the relationship between environmental factors such as land area and market prices. The relationship between environmental data for each crop type is represented as a non-linear variable. This study created a prediction model for production volumes and prices for strategic commodities using an Ensemble Learning approach capable of handling non-linear intervariables. The data used included agroclimatological variables (temperature, rainfall, humidity), land area, production volume, consumption volume, population, and consumer prices across all provinces in Indonesia for the period 2018–2023 for rice, shallots, garlic, red chilies, and cayenne pepper. Pre-processing was performed on each column using a different approach. Because land area, production volume, or similar data may not be available in every region for a given crop type, KNN imputation was applied to that column. However, if these values have significant variance, the model also applied the winsorization pre-processing method to address outliers. Furthermore, the one-hot encoding method was also used for categorical data such as crop types and provinces. To improve model performance, GridSearchCV was used for hyperparameterization, while SHAP was used to obtain important features, which were then used in ensemble models (XGBoost, Gradient Boosting, Random Forest, Stacking, and Voting). For comparison, tests were conducted using Linear Regression, Ridge Regression, SVR, and ANN, which require different preprocessing than the Min–Max scaling method. The evaluation results showed that XGBoost performed best in production prediction in the all-feature scenario because it can handle non-linear relationships and large variations between regions through a boosting approach trained in stages. In price prediction, ANN performed best using all features, while SHAP feature selection (>0.005) improved Random Forest's performance by suppressing noise and producing better predictions. Overall, the model tended to be better at predicting prices than production because the price data had lower variation and a more uniform distribution. Meanwhile, the stacking model shows the lowest performance because the prediction outputs between base learners tend to be similar so that the meta-learner does not provide an increase in model performance.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Ensemble Learning, Gradient Boosting, Price Prediction, Production Prediction, SHAP, Strategic Commodities, XGBoost Ensemble Learning, Gradient Boosting, Prediksi Harga, Prediksi Produksi, SHAP, Komoditas Strategis, XGBoost
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines.
Q Science > QA Mathematics > QA278.2 Regression Analysis. Logistic regression
Q Science > QA Mathematics > QA76.9.D343 Data mining. Querying (Computer science)
S Agriculture > S Agriculture (General)
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Zulaikha Ulhaq Ridha
Date Deposited: 29 Jan 2026 02:36
Last Modified: 29 Jan 2026 02:36
URI: http://repository.its.ac.id/id/eprint/130767

Actions (login required)

View Item View Item