Simbolon, Jonathan Christian (2025) Perbandingan Model XGBoost dan Random Forest dalam Klasifikasi Penyakit Hipertensi dengan Interpretasi Shapley Additive Explanations (SHAP). Other thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
5002211128-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only Download (4MB) | Request a copy |
Abstract
Pada era digital saat ini, pemanfaatan data kesehatan menjadi makin penting untuk mendukung identifikasi risiko penyakit secara cepat dan tepat. Hipertensi merupakan salah satu penyakit kardiovaskular yang umum dan berisiko tinggi, sehingga klasifikasi status hipertensi secara akurat sangat dibutuhkan. Penelitian ini membandingkan dua model ensemble learning, yaitu XGBoost dan Random Forest, dalam klasifikasi hipertensi berbasis data survei indikator kesehatan. Kedua model dievaluasi dari sisi performa terhadap jumlah variasi data, ketahanan terhadap missing value, serta efisiensi komputasi. Hasil pengujian menunjukkan bahwa XGBoost sedikit unggul dengan accuracy 87,59% dan F1-score 87,4%, serta waktu pelatihan yang jauh lebih singkat 2,52 detik dibandingkan Random Forest 80,69 detik. Pada skenario data dengan missing value yang telah diimputasi, kedua model tetap tangguh dengan penurunan performa yang relatif kecil. Sementara itu, Interpretasi Shapley Additive Explanations (SHAP) pada penelitian ini tidak hanya untuk menjelaskan kontribusi fitur secara global maupun lokal, tetapi juga sebagai penerapan feature selection. Evaluasi setelah feature selection menunjukkan bahwa Random Forest mendapatkan performa accuracy 87,54%, F1-score 77,5%, precision 73,42% dan recall 82,07%. Model XGBoost mendapatkan performa accuracy 86,77%, F1-score 74,8\%, precision 74,51%, dan recall 75,1%. Hasil Penelitian ini menunjukkan indikator-indikator kesehatan seperti BMI, usia, kondisi kesehatan, diabetes, dan, kolesterol menjadi indikator yang cukup berpengaruh terhadap penyakit hipertensi. Secara keseluruhan, penelitian ini memberikan kontribusi dalam membandingkan performa model dan interpretasi SHAP sebagai pendekatan feature selection yang selaras dengan literatur medis.
=======================================================================================================================================
In today's digital age, the use of health data has become increasingly important for supporting identification of disease risks. Hypertension is one of the most common and high-risk cardiovascular diseases, making accurate classification of hypertension status essential. This study compares models XGBoost and Random Forest, in hypertension classification based on health indicator data. Both models were evaluated in terms of performance to number of variations data, robustness to missing values, and computational efficiency. The testing results showed that XGBoost performed better with an accuracy of 87,59% and an F1-score of 87,4%, as well as shorter training time 2,52 seconds compared to Random Forest 80,69 seconds. In a scenario with missing values that had been imputed, both models remained robust small decrease in performance. Furthermore, the Shapley Additive Explanations (SHAP) interpretation in this study is not only to explain the contribution of features globally and locally, but also as an application of feature selection. The evaluation after feature selection shows that Random Forest achieves an accuracy of 87,54%, an F1-score of 77,5%, a precision of 73,42%, and a recall of 82,07%. The XGBoost model achieved an accuracy of 86,77%, an F1-score of 74,8%, a precision of 74,51%, and a recall of 75,1%. This study indicates that health indicators such as BMI, age, health condition, diabetes, and cholesterol are significant factors influencing hypertension. Overall, this research contributes to comparing model performance and interpreting SHAP as a feature selection approach aligned with medical literature.
Item Type: | Thesis (Other) |
---|---|
Uncontrolled Keywords: | Classification, Hypertension, XGBoost, Random Forest, SHAP, Klasifikasi, Hipertensi |
Subjects: | R Medicine > RA Public aspects of medicine > RA971 Health services administration. |
Divisions: | Faculty of Mathematics, Computation, and Data Science > Mathematics > 44201-(S1) Undergraduate Thesis |
Depositing User: | Jonathan Christian Simbolon |
Date Deposited: | 04 Aug 2025 08:02 |
Last Modified: | 04 Aug 2025 08:02 |
URI: | http://repository.its.ac.id/id/eprint/127155 |
Actions (login required)
![]() |
View Item |