Prediksi Diabetes Menggunakan Convolutional Neural Network Pada Data Pima Indian

Najmi, Nisa Salvia Najmi (2025) Prediksi Diabetes Menggunakan Convolutional Neural Network Pada Data Pima Indian. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5026211141-Undergraduate_Thesis.pdf] Text
5026211141-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (5MB) | Request a copy

Abstract

Diabetes melitus merupakan penyakit kronis yang sering tidak terdeteksi sejak dini, sehingga meningkatkan risiko komplikasi. Teknologi Machine Learning, khususnya Convolutional Neural Network (CNN), berpotensi membantu deteksi dini diabetes secara otomatis. Penelitian ini bertujuan membangun model prediksi diabetes berbasis CNN 1D menggunakan data Pima Indian. Penelitian tugas akhir ini bertujuan merancang dan mengimplementasikan model CNN satu dimensi (1D‐CNN) yang dioptimalkan untuk meningkatkan akurasi prediksi diabetes pada dataset Pima Indian. Tahapan penelitian meliputi pra‐pemrosesan data (imputasi nilai hilang, eksplorasi penanganan outlier menggunakan metode IQR dan Winsorization, seleksi fitur berbasis korelasi Pearson dan ANOVA F‐test, serta normalisasi), penyeimbangan kelas dengan SMOTE, pembangunan arsitektur 1D‐CNN (lapisan konvolusi, pooling, dropout, dan fully connected dengan aktivasi ReLU dan sigmoid), hingga pelatihan dan validasi model menggunakan strategi early stopping. Hasil eksperimen menunjukkan bahwa model terbaik, CNN 8 Fitur dengan penanganan outlier IQR pada fitur SkinThickness, Insulin, dan BMI serta data seimbang, mencapai akurasi 86%, recall 79%, presisi 80%, F1‐score 0.80, dan Area Under Curve (AUC) 0.94. Sebagai pembanding, model CNN pada studi sebelumnya dengan metode dan data yang sama hanya mencapai akurasi 81,47% dan AUC di bawah 0.90. Sebagai output akhir, dikembangkan aplikasi web berbasis Streamlit yang memungkinkan pengguna untuk memperoleh prediksi risiko diabetes secara real‐time. Namun, pengujian model dengan data default (nilai kesehatan diisi otomatis) menunjukkan penurunan performa dengan akurasi 69,29%, recall 23,40%, dan F1-score 33,85%, yang menunjukkan pentingnya pengisian data secara lengkap untuk hasil prediksi yang lebih akurat. Evaluasi menggunakan confusion matrix memastikan keandalan model dalam mengklasifikasikan pasien diabetes dan non‐diabetes.
=====================================================================================================================================
Diabetes mellitus is a chronic disease that often goes undetected in its early stages, increasing the risk of complications. Machine Learning technology, particularly Convolutional Neural Networks (CNN), has the potential to assist in the early detection of diabetes automatically. This study aims to develop a 1D CNN-based diabetes prediction model using the Pima Indian dataset. The final project focuses on designing and implementing a one-dimensional CNN (1D-CNN) model optimized to improve prediction accuracy. The research stages include data preprocessing (handling missing values, outlier treatment using IQR and Winsorization, feature selection based on Pearson correlation and ANOVA F-test, and normalization), class balancing with SMOTE, CNN architecture development (convolutional layers, pooling, dropout, and fully connected layers using ReLU and sigmoid activations), and model training with early stopping. Experimental results show that the best-performing model—CNN with 8 features, IQR-based outlier handling on SkinThickness, Insulin, and BMI, and balanced data—achieves 86% accuracy, 79% recall, 80% precision, F1-score of 0.80, and Area Under Curve (AUC) of 0.94. For comparison, a previous study using the same dataset and method achieved only 81.47% accuracy and AUC below 0.90. As a final output, a web application was developed using Streamlit, enabling users to obtain real-time diabetes risk predictions. However, testing the model with default health data (i.e., missing inputs automatically filled with healthy values) showed a performance drop, with 69.29% accuracy, 23.40% recall, and 33.85% F1-score, emphasizing the importance of providing complete health data for more accurate predictions. Confusion matrix evaluation confirms the model’s reliability in classifying diabetic and non-diabetic patients.

Item Type: Thesis (Other)
Uncontrolled Keywords: Prediksi Diabetes, 1D Convolutional Neural Network (CNN), Data Tidak Seimbang, Aplikasi Berbasis Web, Diabetes Prediction, 1D Convolutional Neural Network (CNN), Imbalanced Data, SMOTE, Web-based Application
Subjects: R Medicine > R Medicine (General) > R858 Deep Learning
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information System > 57201-(S1) Undergraduate Thesis
Depositing User: Nisa Salvia Najmi
Date Deposited: 11 Jul 2025 08:16
Last Modified: 11 Jul 2025 08:16
URI: http://repository.its.ac.id/id/eprint/119514

Actions (login required)

View Item View Item