Metode Hibrida Oversampling Dan Undersampling Untuk Menangani Ketidakseimbangan Data Kegagalan Akademik Pada Universitas XYZ

Shabrina, Choirunnisa (2019) Metode Hibrida Oversampling Dan Undersampling Untuk Menangani Ketidakseimbangan Data Kegagalan Akademik Pada Universitas XYZ. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 05111750010029-Master_Thesis.pdf]
Preview
Text
05111750010029-Master_Thesis.pdf

Download (6MB) | Preview

Abstract

Ketidakseimbangan (Imbalance) data terjadi pada berbagai macam data termasuk data akademik Universitas XYZ. Apabila terhadap data akademik Universitas XYZ dilakukan proses komputasi (misalnya klasifikasi), adanya imbalance data tersebut berpotensi menyebabkan terjadinya misklasifikasi karena data mayoritas lebih dominan terhadap data minoritas. Metode kombinasi dari oversampling dan undersampling dapat menjadi salah satu solusi dalam menyelesaikan kasus imbalance. Penelitian ini bertujuan menangani permasalahan imbalance data akademik dengan memadukan metode oversampling dengan metode undersampling sehingga diperoleh data sintetik yang lebih representatif. Pada penelitian ini, Adaptive Semi-unsupervised Weighted Oversampling (A-SUWO) digunakan sebagai metode oversampling. Dan metode undersampling yang digunakan adalah: Random Undersampling (RUS), Neighborhood Cleaning Rule (NCL), dan Tomek-Link. Setelah dilakukan proses undersampling dan oversampling, data diklasifikasi menggunakan algoritma Decision Tree C4.5. Evaluasi performa diproses menggunakan perhitungan precision, recall, dan akurasi. Diperoleh nilai rata-rata akurasi tertinggi yang dicapai yaitu 76.55% dengan nilai precision dan recall sebesar 87.04%, 80.35% untuk gabungan metode A-SUWO-Tomeklinks pada dataset akademik. Sedangkan pada dataset Keel, diperoleh nilai akurasi, precision, dan recall yakni 85.41%, 93.18%, 90.54%.
===============================================================================================
Imbalance of data occurs in various kinds of data including XYZ University academic data. If the computation process of the XYZ University academic data is carried out (for example classification), the data imbalance has the potential to cause misclassification because the majority data is more dominant than theminority data. The combination method of oversampling and undersampling can be one solution in solving imbalance cases. This study aims to address the problem of imbalance of academic data by combining the oversampling method with the undersampling method to obtain more representative synthetic data. In this study. AdaptiveSemi-unsupervised Weighted Oversampling (A-SUWO) is used as an oversampling method. Whilethe undersampling methods used were: Random Undersampling (RUS), Neighborhood Cleaning Rules (NCL), and Tomek Link. After the undersampling and oversampling process is carried out, the data is classified using the Decision Tree C4.5 algorithm. Performance evaluation isprocessed using the calculation of precision, recall, and accuracy.Performance evaluation is processed using calculations of precision, memory, F-measure and accuracy. The highest average accuracy value obtained was 76.55% with precision and recall values of 87.04%, 80.35% for the combined A-SUWO-Tomeklinks method in the academic dataset. While in the Keel dataset, the values of accuracy, precision, and recall obtained were 85.41%, 93.18%, 90.54%.

Item Type: Thesis (Masters)
Additional Information: RTIf 004.35 Cho m-1 2019
Uncontrolled Keywords: Imbalance, undersampling, oversampling, RUS, NCL, Tomek Link, A-SUWO, data akademik
Subjects: T Technology > T Technology (General)
T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Information Technology > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Shabrina Choirunnisa
Date Deposited: 15 Jun 2021 03:12
Last Modified: 15 Jun 2021 03:12
URI: http://repository.its.ac.id/id/eprint/60454

Actions (login required)

View Item View Item