Studi Perbandingan Principal Component Analysis (PCA) Dan T-Distributed Stochastic Neighbor Embedding (T-SNE) Dalam Deteksi Cacat Perangkat Lunak

Ghaffaru, Ahda Filza (2025) Studi Perbandingan Principal Component Analysis (PCA) Dan T-Distributed Stochastic Neighbor Embedding (T-SNE) Dalam Deteksi Cacat Perangkat Lunak. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211144-Undergraduate_Thesis.pdf] Text
5025211144-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (6MB) | Request a copy

Abstract

Cacat perangkat lunak merupakan salah satu faktor krusial yang memengaruhi keandalan dan kualitas sistem. Dalam upaya mendeteksi cacat secara lebih akurat, terutama pada data metrik perangkat lunak berdimensi tinggi, diperlukan teknik reduksi dimensi yang tepat guna menyederhanakan representasi fitur tanpa kehilangan informasi penting. Penelitian ini
bertujuan untuk membandingkan efektivitas Principal Component Analysis (PCA) dan parametric t-Distributed Stochastic Neighbor Embedding (parametric t-SNE) sebagai dua pendekatan reduksi dimensi dalam mendukung klasifikasi kecacatan perangkat lunak. Parametric t-SNE digunakan sebagai alternatif dari t-SNE klasik karena kemampuannya untuk melakukan transformasi data baru melalui fungsi parametrik, yang tidak dimungkinkan pada tSNE standar. Studi ini dimulai dengan analisis korelasi antar fitur terhadap label cacat
menggunakan dataset NASA MDP. Selanjutnya, data direduksi menggunakan PCA dan parametric t-SNE, kemudian diklasifikasikan dengan delapan algoritma pembelajaran mesin
yang dievaluasi berdasarkan metrik akurasi, presisi, recall, dan F1-score. Hasil eksperimen menunjukkan bahwa PCA secara umum menghasilkan performa klasifikasi yang lebih stabil
dan tinggi dibandingkan parametric t-SNE, baik pada skenario dataset gabungan maupun terpisah. Di samping itu, Random Forest dan CatBoost menjadi algoritma klasifikasi yang
paling konsisten menunjukkan performa terbaik setelah dilakukan hyperparameter tuning. Pada skenario dataset gabungan, kombinasi PCA dengan Random Forest menghasilkan F1-score sebesar 0,8309. Nilai F1-score tertinggi secara keseluruhan diperoleh pada dataset PC2 sebesar 1,0000. Penelitian ini diharapkan dapat memberikan wawasan mengenai pemilihan teknik reduksi dimensi dan model klasifikasi yang efektif dalam konteks deteksi cacat perangkat lunak.
=========================================================================================================================================================
Software defects are critical factors that affect the reliability and quality of software systems. To improve defect detection, especially in high-dimensional software metric datasets, dimensionality reduction techniques are required to simplify feature representations while preserving essential information. This study aims to compare the effectiveness of Principal Component Analysis (PCA) and parametric t-Distributed Stochastic Neighbor Embedding (parametric t-SNE) as dimensionality reduction methods in supporting software defect classification. Parametric t-SNE is used as an alternative to standard t-SNE due to its ability to apply learned transformations to new data, a feature not supported in classical t-SNE. The study begins with a correlation analysis between features and defect labels using the NASA MDP datasets. The data is then reduced using PCA and parametric t-SNE, followed by classification using eight machine learning algorithms. Model performance is evaluated based on accuracy, precision, recall, and F1-score. Experimental results show that PCA generally yields more stable and higher classification performance compared to parametric t-SNE across both combined and individual dataset scenarios. Additionally, Random Forest and CatBoost consistently deliver the best classification results after hyperparameter tuning. In the combined dataset scenario, the combination of PCA and Random Forest achieved an F1-score of 0.8309. The highest overall F1-score was obtained on the PC2 dataset, with a perfect score of 1.0000. This research provides insight into the selection of dimensionality reduction techniques and classification models for effective software defect detection.

Item Type: Thesis (Other)
Uncontrolled Keywords: PCA, t-SNE, parametric t-SNE, Deteksi cacat perangkat lunak, Reduksi dimensi, PCA, t-SNE, Parametric t-SNE, Software defect detection, Dimensional reduction
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines.
Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.758 Software engineering
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Ahda Filza Ghaffaru
Date Deposited: 31 Jul 2025 01:37
Last Modified: 31 Jul 2025 01:37
URI: http://repository.its.ac.id/id/eprint/123831

Actions (login required)

View Item View Item