Penerapan Metode Combine Sampling Pada Klasifikasi Imbalanced Data Biner Status Ketertinggalan Desa Di Jawa Timur

Dewi Lutfia, Pratiwi (2018) Penerapan Metode Combine Sampling Pada Klasifikasi Imbalanced Data Biner Status Ketertinggalan Desa Di Jawa Timur. Undergraduate thesis, Institut Teknologi Sepuluh Nopember.

[img]
Preview
Text
06211440000054_Dewi Lutfia Pratiwi.pdf - Accepted Version

Download (1MB) | Preview

Abstract

Permasalahan kesenjangan pembangunan antar daerah di Indonesia masih perlu diperhatikan, seperti masih adanya desa tertinggal di beberapa provinsi di Indonesia dimana salah satunya berada di Jawa Timur. Penelitian ini bertujuan meng-klasifikasikan desa tertinggal di Jawa Timur berdasarkan 5 kabupaten yang memiliki persentase desa tertinggal tertinggi, sehingga klasifikasi desa tertinggal dapat dilakukan dengan tepat. Salah satu masalah dalam klasifikasi data adalah komposisi data yang tidak seimbang (imbalanced data), dimana dapat diatasi menggunakan penggabungan metode oversampling SMOTE dan metode undersampling Tomek Links. Perbandingan hasil klasifikasi antara data imbalanced dengan data balanced menunjukkan bahwa metode Regresi Logistik, Regresi Logistik Ridge, maupun Analisis Diskriminan Kernel memiliki nilai AUC, G-mean dan sensitivitas yang meningkat setelah dilakukan balancing data, dimana peningkatan tertinggi pada sensitivitas sebesar 23 kali dan ketiga metode klasifikasi memiliki hasil ketepatan klasifikasi yang comparable. Tetapi jika dibandingkan, metode Regresi Logistik Ridge memiliki AUC, G-mean dan akurasi total yang lebih tinggi pada data balanced dengan memasukkan semua variabel yaitu 78%, 77,91% dan 78,1%. Sehingga klasifikasi status ketertinggalan desa baik diklasifikasikan dengan metode Regresi Logistik Ridge. ============================================================ The problems of development gap between regions in Indonesia still need to be considered, as there are still underdeveloped villages in several provinces in Indonesia where one of them is in East Java. The aim of this research is to classify backward villages in East Java based on 5 districts that have the highest percentage of underdeveloped villages, so that the classification of underdeveloped villages can be done appropriately. One of the problems in data classification is the unbalanced data composition (imbalanced data), which can be solved using combine between oversampling SMOTE and undersampling Tomek Links. The comparison of the classification results between the imbalanced data and the balanced data indicates that the Logistic Regression, Ridge Logistic Regression, and Kernel Discriminant Analysis have AUC, G-mean and sensitivity have an increased value after balancing data, where sensitivity has the highest increase that is 23 times and all classifier method have comparable classification accuracy result. But when compared, the Ridge Logistic Regression method has higher AUC, G-mean and accuracy on the balanced data by including all the variables that is 78%, 77,9% and 78,1%. So the classification of the underdeveloped status of the village is well classified by the Ridge Logistic Regression method.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Combine Sampling, Desa Tertinggal, Imbalanced Data, Klasifikasi ============================================================ Classification, Combine Sampling, Imbalanced Data, Underdeveloped Villages
Subjects: Q Science > QA Mathematics > QA353.K47 Kernel functions (analysis)
Q Science > QA Mathematics > QA76.9.D343 Data mining
Divisions: Faculty of Mathematics, Computation, and Data Science > Statistics > 49201-(S1) Undergraduate Thesis
Depositing User: Dewi Lutfia Pratiwi
Date Deposited: 08 Jul 2021 09:05
Last Modified: 08 Jul 2021 09:05
URI: https://repository.its.ac.id/id/eprint/57196

Actions (login required)

View Item View Item