Metode Hibrida Oversampling untuk Menangani Imbalanced Multi-label

Tursina, Dara (2024) Metode Hibrida Oversampling untuk Menangani Imbalanced Multi-label. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025202008-Master_Thesis.pdf] Text
6025202008-Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2026.

Download (2MB) | Request a copy

Abstract

Data dan informasi terus mengalami pertambahan seiring dengan berkembangnya tekonologi digital. Ketersediaan data Data dan informasi terus mengalami pertambahan seiring dengan berkembangnya teknologi digital. Ketersediaan data menjadi semakin banyak dan kompleks. Kehadiran data yang tidak seimbang menyebabkan terjadinya kesalahan klasifikasi karena dominasi data kelas mayoritas terhadap kelas minoritas. Tidak hanya terbatas pada kelas biner, ketidakseim-bangan data juga sering ditemui pada data multi-label yang semakin penting dalam beberapa tahun terakhir karena cakupan aplikasi yang luas. Namun, masalah ketidakseimbangan kelas telah menjadi ciri khas dari banyak himpunan data multi-label yang kompleks, menjadikannya fokus utama penelitian ini. Penanganan data multi-label yang tidak seimbang masih memiliki banyak potensi untuk dikembangkan. Salah satu pendekatan, Synthetic Oversampling of Multi-Label Data Based on Local Label Distribution (MLSOL), dan Integrating Unsupervised Clustering and Label-specific Oversampling to Tackle Imbalanced Multi-Label Data (UCLSO), telah dikembangkan. UCLSO hanya memusatkan perhatiannya pada kelas mayoritas, yang dapat menyebabkan ketidakseimbangan data dan overfitting yang berlebihan. Meskipun efektif dalam mencegah dominasi kelas mayoritas, pendekatan ini tidak dapat mengatasi kurangnya variasi dalam kelas minoritas. Se-baliknya, MLSOL berfokus pada kelas minoritas, memungkinkan terciptanya variasi dalam data multi-label dan mening-katkan kinerja klasifikasi secara signifikan. Penelitian ini bertujuan untuk mengatasi permasalahan ketidakseimbangan data dengan menggabungkan metode oversampling MLSOL dan UCLSO. Menggabungkan kedua pendekatan ini diharap-kan dapat memanfaatkan kekuatan dan mengurangi kelemahan masing-masing, sehingga menghasilkan peningkatan kiner-ja yang signifikan. Hasil uji coba menunjukkan bahwa metode hibrida oversampling menghasilkan nilai tertinggi pada data biologi dengan nilai F-1 score sebesar 88% Sedangkan metode oversampling tunggal UCLSO dan MLSOL pada data biologi masing-masing memperoleh nilai F-1 score sebesar 67% dan 62%.
=============================================================================================================================
Data and information continue to increase along with the development of digital technology. Data availability Data and information continues to increase along with the development of digital technology. Data availability is becoming increasingly numerous and complex. The presence of unbalanced data causes classification errors due to the dominance of majority class data over the minority class. Not only limited to the binary class, but data imbalance is also often encountered in multi-label data which has become increasingly important in recent years due to its wide application scope. However, the problem of class imbalance has been a characteristic of many complex multi-label datasets, making it the focus of this research. Handling unbalanced multi-label data still has a lot of potential for development. One approach, Synthetic Over-sampling of Multi-Label Data Based on Local Label Distribution (MLSOL) and Integrating Unsupervised Clustering and Label-specific Oversampling to Tackle Imbalanced Multi-Label Data (UCLSO), has been developed. UCLSO only focuses its attention on the majority class, which can lead to data imbalance and excessive overfitting. Although effective in preventing majority class domination, this approach cannot overcome the lack of variation within the minority class. In contrast, MLSOL focuses on minority classes, allowing for variations in multi-label data and significantly improving classification performance. This research aims to overcome the problem of data imbalance by combining the MLSOL and UCLSO oversampling methods. Combining these two approaches is expected to exploit the strengths and reduce the weaknesses of each, resulting in significant performance improvements. The trial results show that the hybrid oversampling method produces the highest value on biological data with an F-1 score of 88%. Meanwhile, the single oversampling methods UCLSO and MLSOL on biological data respectively obtain an F-1 score of 67% and 62%.

Item Type: Thesis (Masters)
Uncontrolled Keywords: imbalanced data, ensemble oversampling, imbalanced multi-label Ketidakseimbangan data, ensemble oversampling, ketidakseimbangan multi-label
Subjects: T Technology > T Technology (General) > T57.5 Data Processing
T Technology > T Technology (General) > T57.6 Operations research--Mathematics. Goal programming
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Dara Tursina
Date Deposited: 11 Feb 2024 14:41
Last Modified: 11 Feb 2024 14:41
URI: http://repository.its.ac.id/id/eprint/106591

Actions (login required)

View Item View Item