Studi Performansi Individual dan Hybrid Sampling Pada Klasifikasi Status Kualitas Air Sungai Menggunakan Random Forest

Fadhilah, Rahmi (2025) Studi Performansi Individual dan Hybrid Sampling Pada Klasifikasi Status Kualitas Air Sungai Menggunakan Random Forest. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6003231016-Master_Thesis.pdf] Text
6003231016-Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2027.

Download (2MB) | Request a copy

Abstract

Kualitas air yang aman dan dapat dipantau dengan akurat sangat penting untuk kesehatan lingkungan dan manusia. Penelitian ini mengevaluasi kinerja algoritma Random Forest dalam mengklasifikasikan kualitas air sungai dengan menggunakan teknik resampling untuk mengatasi ketidakseimbangan kelas, yaitu Random Under Sampling (RUS), Rapidly Converging Gibbs Sampler (RACOG), dan kombinasi RACOG-RUS. Kajian teori tentang estimasi penentuan variable importance pada Random Forest menunjukkan bahwa Mean Decrease Accuracy dan Mean Decrease Gini adalah dua metode utama yang digunakan untuk mengukur kontribusi variabel dalam model. Mean Decrease Accuracy menilai seberapa banyak akurasi model berkurang ketika fitur tertentu dihilangkan, sementara Mean Decrease Gini mengukur penurunan impurity dalam pohon keputusan yang dihasilkan oleh fitur tersebut, memberikan indikasi pentingnya fitur dalam klasifikasi. Analisis awal mengungkapkan bahwa model Random Forest dengan RACOG dan RACOG-RUS tanpa seleksi ftur mencapai akurasi yang sangat tinggi, yakni 92.63% dan 92.48%, serta AUC di atas 93%, menunjukkan keunggulan dalam menangani data dan membedakan kelas positif dan negatif. Sebaliknya, model dengan RUS menunjukkan akurasi lebih rendah (54.63%) dan AUC (77.00%), menekankan bahwa RUS kurang efektif dalam klasifikasi dibandingkan RACOG dan RACOG US. Penerapan seleksi fitur membawa perubahan signifikan, dengan model RACOG dan RACOG-RUS menunjukkan akurasi yang stabil di kisaran 82.64% hingga 84.03% dan F1 score antara 90.49% hingga 91.32%. Meski demikian, seleksi fitur menurunkan nilai AUC, mengindikasikan penurunan kemampuan model dalam membedakan kelas secara konsisten. Model dengan RUS juga menunjukkan peningkatan akurasi menjadi 57.87% dan F1 score 72.96%, tetapi dengan penurunan AUC, menunjukkan trade-off antara deteksi kelas mayoritas dan minoritas. Secara keseluruhan, metode RACOG-RUS dengan seleksi fitur memberikan keseimbangan dan stabilitas yang lebih baik dalam klasifikasi kua litas air, meskipun hasil AUC menurun, menunjukkan bahwa seleksi fitur berkontribusi pada peningkatan keseimbangan antara precision dan recall.
================================================================================================================================
Safe and accurate monitoring of water quality is essential for environmental and human health. This study evaluates the performance of the Random Forest algorithm in classifying river water quality using resampling techniques to overcome class imbalance, namely, Random Undersampling (RUS), Rapidly Converging Gibbs Sampler (RACOG), and RACOG-RUS combination. Theoretical studies on the estimation of variable importance determination in Random Forest show that Mean Decrease Accuracy and Mean Decrease Gini are the two main methods used to measure the contribution of variables in the model. The Mean Decrease Accuracy measure assesses how much the accuracy of the model decreases when a particular feature is removed, and the mean decrease Gini measure the decrease in impurity in the decision tree generated by that feature, which indicates the importance of the feature in classification. Preliminary analysis revealed that the Random Forest models with RACOG and RACOG-RUS without feature selecton achieved very high accuracies of 92.63% and 92.48%, respectively, and AUCs were greater than 93%, which indicates superiority in handling data and distinguishing between positive and negative classes. In contrast, the model with RUS showed lower accuracy (54.63%) and AUC (77.00%), emphasizing that RUS is less effective in classification than RACOG and RACOG RUS. The application of feature selection brought significant changes, with the RACOG and RACOG-RUS models showing stable accuracy in the range of 82.64% to 84.03% and an F1 score between 90.49% and 91.32%. However, feature selection reduced the AUC value, which indicates a decrease in the model's ability to distinguish classes consistently. The model with RUS also showed an increase in accuracy (57.87%; F1 score of 72.96%) but a decrease in AUC, indicating a trade off between majority and minority class detection. Overall, the RACOG-RUS method with feature selection provided better balance and stability in water quality classifiation despite the decreased AUC values, suggesting that feature selection contributed to the improved balance between precision and recall.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Ketidakseimbangan Data, Random Under Sampling (RUS), Rapidly Converging Gibbs Sampler (RACOG), RACOG-RUS, Random Forest (RF) Data Imbalance, Random Under Sampling (RUS), Rapidly Converging Gibbs Sampler (RACOG), RACOG-RUS, Random Forest (RF)
Subjects: H Social Sciences > HA Statistics > HA31.7 Estimation
Divisions: Faculty of Mathematics, Computation, and Data Science > Statistics > 49101-(S2) Master Thesis
Depositing User: Rahmi Fadhilah
Date Deposited: 23 Jan 2025 03:05
Last Modified: 23 Jan 2025 03:05
URI: http://repository.its.ac.id/id/eprint/116698

Actions (login required)

View Item View Item