Menguji Ketahanan Performa dari Integrasi Stacked Autoencoder dengan Deep Learning terhadap Unclean Dataset Dalam Mendeteksi Serangan Brute Force pada Protokol SSH dan FTP

Julian Arrizki, Deka (2025) Menguji Ketahanan Performa dari Integrasi Stacked Autoencoder dengan Deep Learning terhadap Unclean Dataset Dalam Mendeteksi Serangan Brute Force pada Protokol SSH dan FTP. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025231088_Master_Thesis.pdf]

Text
6025231088_Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only
Download (7MB) | Request a copy

Abstract

Serangan brute force merupakan salah satu ancaman keamanan siber yang paling signifikan dan persisten. Protokol seperti Secure Shell (SSH) dan File Transfer Protocol (FTP) menjadi target utama karena kerentanannya terhadap percobaan login berulang. Terdapat kebutuhan akan mekanisme deteksi yang tidak hanya akurat tetapi juga tahan pada kondisi unclean dataset yang telah dimanipulasi baik pada fitur maupun labelnya. Metode yang ada berfokus pada mekanisme deteksi pada unclean dataset dengan hanya mempertimbangkan satu aspek, baik fitur atau label, tanpa mempertimbangkan korelasi di antara keduanya. Untuk mensimulasikan kondisi tersebut, studi kami menghasilkan label flipping untuk label data dan serangan adversarial berbasis perturbasi menggunakan Fast Gradient Sign Method (FGSM) untuk fitur data. Studi ini bertujuan untuk mengevaluasi ketahanan model deep learning dalam mendeteksi serangan brute force, baik pada label maupun kombinasi fitur-label, pada protokol SSH dan FTP ketika diuji dengan unclean dataset. Karena waktu serangan brute force yang singkat, sementara jumlah fitur yang kompleks dan mekanisme deteksi harus cepat, model mengurangi dimensi data menggunakan stacked autoencoder (SAE). Mode¬l yang dievaluasi mencakup deep learning seperti Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), dan transformer, serta model baseline meliputi traditional dan ensemble learning. Eksperimen dilakukan melalui desain eksperimental tiga fase. Fase-0 membandingkan kinerja model dan waktu komputasi pada data asli dan data yang direduksi oleh stacked autoencoder. Fase-1 mengevaluasi dampak label flipping pada data pelatihan pada berbagai persentase (0,01% hingga 10%) dan efeknya terhadap kinerja model. Fase-2 memperkenalkan skenario serangan gabungan, menerapkan label flipping pada label pelatihan dan serangan FGSM pada nilai fitur dalam data uji. Hasil eksperimen menunjukkan bahwa penggunaan stacked autoencoder secara efektif mengurangi waktu komputasi deep learning tiga hingga lima kali. Pada Fase-1, pembelajaran ensemble mencapai kinerja terbaik. Namun, pada Fase-2, model deep learning menunjukkan ketahanan yang lebih daripada model baseline, menunjukkan bahwa arsitektur deep learning lebih baik dalam menangani unclean dataset berbasis fitur-label.
===========================================================
Brute force attacks represent one of the most significant and persistent cybersecurity threats. Protocols such as Secure Shell (SSH) and File Transfer Protocol (FTP) are primary targets due to their vulnerability to repeated login attempts. There is a growing need for detection mechanisms that are not only accurate but also robust under an unclean dataset that has been manipulated either in its features or its labels. Existing methods typically focus on detection mechanisms that address unclean datasets by considering only one aspect, either the features or the labels, without considering the correlation between them. To simulate the condition, our works generate label flipping for the data labels and perturbation-based adversarial attacks using the Fast Gradient Sign Method (FGSM) for the data features. This study aims to evaluate the robustness of deep learning models in detecting brute force attacks, whether using labels only or both features-labels, on SSH and FTP protocols when tested with unclean datasets. Due to the shortened attack time of brute force attacks, while the number of features is quite complex and the detection mechanism must be fast, our model reduces the data dimension using a stacked autoencoder (SAE). The evaluated models include ¬¬¬¬deep learning approaches such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and transformer, as well as baseline models for performance comparison, including traditional and ensemble machine learning methods. The experiments are conducted through a three-phase experimental design. Phase-0 compares model performance and computational time on the original data and data reduced by the stacked autoencoder. Phase-1 evaluates the impact of label flipping on the training data at varying percentages (0.01% to 10%) and its effect on model performance. Phase-2 introduces a combined attack scenario, applying label flipping to the training labels and the FGSM attack on feature values in the test data. The experimental results show that the use of a stacked autoencoder effectively reduces deep learning computation time by three to five times. In Phase-1, ensemble learning achieved the best performance. However, in Phase-2, deep learning models demonstrated greater robustness than the baseline models, indicating that deep learning architectures are better at handling features-labels unclean datasets.

Item Type:	Thesis (Masters)
Uncontrolled Keywords:	Brute Force, Deep Learning, Robustness, Stacked Autoencoder, Unclean Dataset
Subjects:	T Technology > T Technology (General) > T57.8 Nonlinear programming. Support vector machine. Wavelets. Hidden Markov models. T Technology > T Technology (General) > T57.84 Heuristic algorithms. T Technology > T Technology (General) > T58.5 Information technology. IT--Auditing
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User:	Deka Julian Arrizki
Date Deposited:	04 Aug 2025 11:54
Last Modified:	04 Aug 2025 11:55
URI:	http://repository.its.ac.id/id/eprint/125011

Actions (login required)

View Item