Perbandingan Kualitas Data Tabular Sintetis yang Dihasilkan oleh Berbagai Metode GAN

Hermawan, Sony (2025) Perbandingan Kualitas Data Tabular Sintetis yang Dihasilkan oleh Berbagai Metode GAN. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211226_Undergraduate_Thesis.pdf] Text
5025211226_Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (13MB) | Request a copy

Abstract

Penelitian ini bertujuan untuk membandingkan dan mengevaluasi kualitas data tabular sintetis yang dihasilkan oleh beberapa metode Generative Adversarial Network (GAN). Data sintetis telah menjadi solusi yang menjanjikan untuk mengatasi berbagai tantangan dalam analisis data, termasuk ketidakseimbangan data dan perlindungan privasi. Meski demikian, memastikan kualitas data sintetis agar dapat digunakan secara efektif dalam machine learning tetap menjadi tantangan utama. Metode yang digunakan meliputi implementasi tiga model GAN yaitu CTGAN, CTAB-GAN, dan ADS-GAN pada dataset Adult, Credit Card, dan Wine Quality. Evaluasi kualitas dilakukan menggunakan analisis visual melalui histogram dan correlation heatmap, pengujian statistik dengan KSTest dan CSTest, metrik TabSynDex untuk evaluasi komprehensif, serta perbandingan ML Efficacy menggunakan empat model machine learning. Hasil penelitian menunjukkan bahwa setiap metode GAN memiliki keunggulan spesifik pada skenario yang berbeda. ADS-GAN terbukti superior dalam menjaga fidelitas statistik dan korelasi, CTGAN unggul dalam menangani distribusi data yang kompleks dan multimodal, sedangkan CTAB-GAN menunjukkan konsistensi dan ketahanan terbaik dengan performa ML Efficacy yang stabil di semua dataset. Temuan ini menggarisbawahi bahwa tidak ada satu model yang dominan secara absolut, dan pilihan metode terbaik bergantung pada karakteristik dataset dan prioritas evaluasi yang diinginkan.
====================================================================================================================================
This study aims to compare and evaluate the quality of synthetic tabular data generated by several Generative Adversarial Network (GAN) methods. Synthetic data has emerged as a promising solution to address various challenges in data analysis, including data imbalance and privacy protection. However, ensuring the quality of synthetic data for effective use in machine learning remains a significant challenge. The methods used include implementation of three GAN models: CTGAN, CTAB-GAN, and ADS-GAN on Adult, Credit Card, and Wine Quality datasets. Quality evaluation was conducted using visual analysis through histograms and correlation heatmaps, statistical testing with Kolmogorov-Smirnov Test and Chi-Square Test, TabSynDex metric for comprehensive evaluation, and Ml Efficacy comparison using four machine learning models. The research results show that each GAN method has specific advantages in different scenarios. ADS-GAN proved superior in maintaining statistical fidelity and correlation, CTGAN excelled in handling complex and multimodal data distributions, while CTAB-GAN demonstrated the best consistency and robustness with stable Ml Efficacy performance across all datasets. These findings underscore that no single model is absolutely dominant, and the choice of the best method highly depends on dataset characteristics and desired evaluation priorities.

Item Type: Thesis (Other)
Uncontrolled Keywords: ADS-GAN, Analisis Data, CTAB-GAN, CTGAN, Data Sintetis, Data Tabular, Generative Adversarial Network, ADS-GAN, CTAB-GAN, CTGAN, Data Analysis, Generative Adversarial Networks, Synthetic Data, Tabular Data
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines.
Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
Q Science > QA Mathematics > QA76.9.I52 Information visualization
T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Sony Hermawan
Date Deposited: 25 Jul 2025 07:04
Last Modified: 25 Jul 2025 07:09
URI: http://repository.its.ac.id/id/eprint/121441

Actions (login required)

View Item View Item