Pendekatan Synthetic Minority Oversampling Technique Dalam Menangani Klasifikasi Imbalanced Data Biner (Studi Kasus: Status Ketertinggalan Desa Di Jawa Timur)

Imanwardhani, Canggih Shoffi (2018) Pendekatan Synthetic Minority Oversampling Technique Dalam Menangani Klasifikasi Imbalanced Data Biner (Studi Kasus: Status Ketertinggalan Desa Di Jawa Timur). Undergraduate thesis, Institut Teknologi Sepuluh Nopember.

[img]
Preview
Text
06211440000051-Undergraduate_Theses.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Klasifikasi pada data imbalanced menghasilkan ketepatan akurasi yang jelek dan cenderung memprediksi ke kelas mayoritas. Untuk menyeimbangkan proporsi kelas dilakukan resampling data minoritas dengan Synthetic Minority Oversampling Technique (SMOTE). Metode klasifikasi yang akan digunakan adalah Regresi Logistik (LR), Regresi Logistik Ridge (LR Ridge) dan Analisis Diskriminan Kernel (ADK). Tujuan dari penelitian ini adalah menganalisis efektifitas SMOTE dalam meningkatkan ketepatan akurasi klasifikasi. Data yang digunakan adalah desa 5 Kabupaten di Jawa Timur yang berjumlah 1.122 desa dengan kelompok berstatus desa tertinggal sebanyak 115 desa. Dengan menggunakan partisi data stratified 10¬-fold cross validation didapatkan nilai rata-rata AUC, G-mean dan sensitivitas yang kecil pada data imbalanced. Untuk data balanced, setelah dilakukan resampling kelas minoritas dengan SMOTE didapatkan peningkatan nilai rata-rata AUC, G-mean dan sensitivitas yaitu menjadi sekitar 76% serta standar deviasi yang dihasilkan juga lebih kecil dibandingkan klasifikasi data imbalanced. Pada data balanced, LR dengan semua variabel memberikan nilai AUC (76,4%) dan G-mean (76,35%) yang sedikit lebih tinggi dibandingkan metode lain. Peningkatan indikator klasifikasi tertinggi terjadi di nilai sensitivitas yang mencapai 22 kali lipat. Peningkatan nilai G-mean dan sensitivitas tertinggi pada kombinasi SMOTE dan LR Ridge dengan semua variabel. =============================================================================================== The classification of imbalanced data turnout poor accuracy and tend to predicts the majority class. To balance the proportion of classes was used resampling minority data with Synthetic Minority Oversampling Technique (SMOTE). The method of classification to be used is Logistic Regression (LR), Ridge Logistic Regression (LR Ridge) and Kernel Discriminant Analysis (ADK). The purpose of this study is to analyze the effectiveness of SMOTE in improving accuracy. The data used are 5 regency villages in East Java, amounting to 1122 villages which group on underdeveloped villages are 115 villages. The classification used partion data with stratified 10-fold cross validation. Performance classification in imbalanced data gain high total accuracy but low in AUC, G-mean and sensitivity. After balancing with SMOTE, the average of AUC, G-mean and sensitivity were increase around 76% and value of standar deviation were also smaller than imbalanced data classification. In the balanced data, LR with all variables gives high AUC (76,4%) and G-mean (76,35%) values that wew slightly higher than other methods. The highest increase of indicator classification occurred in the sensitivity value which reached 22 times. Highest increased G-mean and sensitivity in combination of SMOTE with LR Ridge.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Analisis Diskriminan Kernel, Imbalanced Data, Regresi Logistik, Regresi Logistik Ridge, SMOTE. ====================================================== Imbalanced Data, Kernel Discriminant Analysis, Logisctic Regression, Logisctic Ridge Regression, SMOTE
Subjects: H Social Sciences > HA Statistics > HA31.3 Regression. Correlation
H Social Sciences > HD Industries. Land use. Labor > HD108 Classification (Theory. Method. Relation to other subjects )
H Social Sciences > HD Industries. Land use. Labor > HD30.23 Decision making. Business requirements analysis.
H Social Sciences > HN Social history and conditions. Social problems. Social reform
H Social Sciences > HT Communities. Classes. Races > HT133 City and Towns. Land use,urban
H Social Sciences > HV Social pathology. Social and public welfare
Divisions: Faculty of Mathematics, Computation, and Data Science > Statistics > 49201-(S1) Undergraduate Thesis
Depositing User: Canggih Shoffi Imanwardhani
Date Deposited: 18 Jul 2021 21:25
Last Modified: 18 Jul 2021 21:25
URI: https://repository.its.ac.id/id/eprint/57168

Actions (login required)

View Item View Item