Augmentasi Data Terstruktur Menggunakan Generative Adversarial Network Untuk Penanganan Ketidakseimbangan Data

Rahmatullah, Mohammad Luthfi (2025) Augmentasi Data Terstruktur Menggunakan Generative Adversarial Network Untuk Penanganan Ketidakseimbangan Data. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 05111740000038-Undergraduate_Thesis.pdf] Text
05111740000038-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (2MB) | Request a copy

Abstract

Ketidakseimbangan data merupakan masalah yang sering dihadapi ketika melakukan pelatihan model machine leaning. Hal ini terjadi ketika distribusi antar kelas dalam sebuah dataset jumlahnya berbeda cukup jauh. Dampaknya adalah rendahnya akurasi model dalam prediksi kelas minoritas, yang biasanya adalah kelas yang penting contohnya adalah pada diagnosis medis, deteksi penipuan dan kemanan siber. Beberapa metode sudah dikembangkan sebagai solusi untuk mengatasi ketidakseimbangan data. Meskipun demikian, metode-metode tersebut masih memiliki kekurangan. Oversampling berisiko menambah noise atau menghasilkan data sintetis yang kurang sesuai dengan keadaan, sedangkan undersampling berpotensi menghilangkan informasi penting dari kelas mayoritas. Cost-sensitive learning memerlukan penyesuaian penyesuaian bobot yang tepat, dan metode deep learning memerlukan daya komputasi dan jumlah data yang besar. Sebagai tanggapan untuk masalah seperti itu, augmentasi data berbasis generative adversarial network (GAN) mulai dijelajahi. GAN memiliki kemampuan untuk menghasilkan data sintetis yang cukup realistis sehingga dapat menciptakan sampel data kelas minoritas yang menyerupai data asli. Data sintetis ini akan digabungkan dengan data asli dan digunakan untuk melatih model machine learning. Akan digunakan beberapa model machine learning tradisional. Hasil penelitian didapatkan dapat menaikkan metrik evaluasi kurang lebih lima persen.
========================================================================================================================
Data imbalance is a problem often encountered when training machine learning models. This occurs when the distribution between classes in a dataset is quite different. The impact is the low accuracy of the model in predicting the minority class, which is usually an important class, for example in medical diagnosis, fraud detection and cybersecurity. Several methods have been developed as solutions to overcome data imbalance. However, these methods still have drawbacks. Oversampling risks adding noise or producing synthetic data that is less appropriate to the situation, while undersampling has the potential to remove important information from the majority class. Cost-sensitive learning requires precise weight adjustment, and deep learning methods require high computational power and large amounts of data. In response to such problems, generative adversarial network (GAN) based data augmentation began to be explored. GANs can generate synthetic data that is realistic enough to create data samples of the minority class that resemble the original data. This synthetic data will be combined with original data and used to train machine learning models. We will use several traditional machine learning models. The results of the study showed that the evaluation metrics increased by approximately five percent.

Item Type: Thesis (Other)
Uncontrolled Keywords: ketidakseimbangan data, augmentasi data, GAN, data imbalance, data augmentation, GAN
Subjects: T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Mohammad Luthfi Rahmatullah
Date Deposited: 05 Aug 2025 07:35
Last Modified: 05 Aug 2025 07:35
URI: http://repository.its.ac.id/id/eprint/126877

Actions (login required)

View Item View Item