GAN-LSTM for Feature Enhancement and Data Generation on Imbalance Dataset in IoT Malware Analysis

Setiawan, Gregorius Edo Satriatama Eka (2025) GAN-LSTM for Feature Enhancement and Data Generation on Imbalance Dataset in IoT Malware Analysis. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025231050-Master_Thesis.pdf] Text
6025231050-Master_Thesis.pdf

Download (8MB)

Abstract

Perluasan layanan internet dan penggunaan Internet of Things (IoT), yang didorong oleh kemajuan jaringan seluler 5G, telah mengubah berbagai sektor, seperti komunikasi, data sharing, dan e-commerce. Pertumbuhan ini meningkatkan risiko keamanan siber mengingat aliran data yang besar dan sifat otonom sistem IoT yang membuat sistem tersebut rentan terhadap ancaman di dunia maya. Pada tahun 2024, lebih dari 1,2 miliar malware dan aplikasi yang berpotensi tidak diinginkan (PUA) terdeteksi, yang mencerminkan lonjakan ancaman di lapisan aplikasi. Meskipun metode enkripsi seperti Transport Layer Security (TLS) melindungi data privacy, metode enkripsi juga meningkatkan kompleksitas dalam mengidentifikasi lalu lintas berbahaya, meningkatkan kebutuhan akan mekanisme deteksi lanjutan. Selain bahaya tersebut, ketidakseimbangan data di dalam dataset juga menjadi perhatian karena data yang tidak imbang akan menekan kinerja deteksi malware, sehingga malasah tersebut harus diselesaikan. Penelitian ini memperkenalkan model FE-CGAN-LSTM yang dapat mengatasi ketidakseimbangan data dan meningkatkan representasi fitur secara efektif. Dataset asli menghasilkan hasil moderat pada deteksi malware dengan akurasi sekitar 90,7–91,0% dan skor F1 yang lebih rendah. Menggunakan dataset yang telah ditingkatkan fiturnya oleh FE-CGAN-LSTM, terdapat peningkatan performa, dimana semua model klasifikasi mencapai skor 99,99% pada metrik akurasi, presisi, recall, dan skor F1.
=====================================================================================================================================
The expansion of internet services and the use of Internet of Things (IoT), which are driven by advances in 5G cellular networks, have transformed multiple sectors, that include communication, data sharing, and e-commerce. This growth amplifies cybersecurity risks given the massive data flow and the autonomous nature of IoT systems, making them susceptible to cyber threats. In 2024, over 1.2 billion instances of malware and potentially unwanted applications (PUAs) were detected. These conditions reflects a surge in threats at the application layer. Although encryption methods like Transport Layer Security (TLS) safeguard data privacy, they also increase complexity in identifying malicious traffic, raising the demand for advanced detection mechanisms. In addition to these dangers, data imbalance is also a concern because imbalanced data will hinder the performance of malware detection, so it must be resolved immediately. This research introduces the FE-CGAN-LSTM model that can address data imbalance and improve feature representation in effective way. The original dataset yields moderate results with accuracies around 90.7–91.0% and lower F1 scores in malware identification. Using the dataset improved by FE-CGAN-LSTM, performances increase is observed, with all classifiers achieving scores of 99.99% accuracy, precision, recall, and F1-score.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Dataset tak imbang, Generative Adversarial Networks, IoT-23, Long-Short Term Memory, Malware, Pembuatan data sintetis, Peningkatan fitur
Subjects: Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Gregorius Edo Satriatama Eka Setiawan
Date Deposited: 04 Aug 2025 10:24
Last Modified: 04 Aug 2025 10:24
URI: http://repository.its.ac.id/id/eprint/124800

Actions (login required)

View Item View Item