Ghaffaru, Ahda Filza (2025) Studi Perbandingan Principal Component Analysis (PCA) Dan T-Distributed Stochastic Neighbor Embedding (T-SNE) Dalam Deteksi Cacat Perangkat Lunak. Other thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
5025211144-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only Download (6MB) | Request a copy |
Abstract
Cacat perangkat lunak merupakan salah satu faktor krusial yang memengaruhi keandalan dan kualitas sistem. Dalam upaya mendeteksi cacat secara lebih akurat, terutama pada data metrik perangkat lunak berdimensi tinggi, diperlukan teknik reduksi dimensi yang tepat guna menyederhanakan representasi fitur tanpa kehilangan informasi penting. Penelitian ini
bertujuan untuk membandingkan efektivitas Principal Component Analysis (PCA) dan parametric t-Distributed Stochastic Neighbor Embedding (parametric t-SNE) sebagai dua pendekatan reduksi dimensi dalam mendukung klasifikasi kecacatan perangkat lunak. Parametric t-SNE digunakan sebagai alternatif dari t-SNE klasik karena kemampuannya untuk melakukan transformasi data baru melalui fungsi parametrik, yang tidak dimungkinkan pada tSNE standar. Studi ini dimulai dengan analisis korelasi antar fitur terhadap label cacat
menggunakan dataset NASA MDP. Selanjutnya, data direduksi menggunakan PCA dan parametric t-SNE, kemudian diklasifikasikan dengan delapan algoritma pembelajaran mesin
yang dievaluasi berdasarkan metrik akurasi, presisi, recall, dan F1-score. Hasil eksperimen menunjukkan bahwa PCA secara umum menghasilkan performa klasifikasi yang lebih stabil
dan tinggi dibandingkan parametric t-SNE, baik pada skenario dataset gabungan maupun terpisah. Di samping itu, Random Forest dan CatBoost menjadi algoritma klasifikasi yang
paling konsisten menunjukkan performa terbaik setelah dilakukan hyperparameter tuning. Pada skenario dataset gabungan, kombinasi PCA dengan Random Forest menghasilkan F1-score sebesar 0,8309. Nilai F1-score tertinggi secara keseluruhan diperoleh pada dataset PC2 sebesar 1,0000. Penelitian ini diharapkan dapat memberikan wawasan mengenai pemilihan teknik reduksi dimensi dan model klasifikasi yang efektif dalam konteks deteksi cacat perangkat lunak.
=========================================================================================================================================================
Software defects are critical factors that affect the reliability and quality of software systems. To improve defect detection, especially in high-dimensional software metric datasets, dimensionality reduction techniques are required to simplify feature representations while preserving essential information. This study aims to compare the effectiveness of Principal Component Analysis (PCA) and parametric t-Distributed Stochastic Neighbor Embedding (parametric t-SNE) as dimensionality reduction methods in supporting software defect classification. Parametric t-SNE is used as an alternative to standard t-SNE due to its ability to apply learned transformations to new data, a feature not supported in classical t-SNE. The study begins with a correlation analysis between features and defect labels using the NASA MDP datasets. The data is then reduced using PCA and parametric t-SNE, followed by classification using eight machine learning algorithms. Model performance is evaluated based on accuracy, precision, recall, and F1-score. Experimental results show that PCA generally yields more stable and higher classification performance compared to parametric t-SNE across both combined and individual dataset scenarios. Additionally, Random Forest and CatBoost consistently deliver the best classification results after hyperparameter tuning. In the combined dataset scenario, the combination of PCA and Random Forest achieved an F1-score of 0.8309. The highest overall F1-score was obtained on the PC2 dataset, with a perfect score of 1.0000. This research provides insight into the selection of dimensionality reduction techniques and classification models for effective software defect detection.
Item Type: | Thesis (Other) |
---|---|
Uncontrolled Keywords: | PCA, t-SNE, parametric t-SNE, Deteksi cacat perangkat lunak, Reduksi dimensi, PCA, t-SNE, Parametric t-SNE, Software defect detection, Dimensional reduction |
Subjects: | Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines. Q Science > QA Mathematics > QA336 Artificial Intelligence Q Science > QA Mathematics > QA76.758 Software engineering |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
Depositing User: | Ahda Filza Ghaffaru |
Date Deposited: | 31 Jul 2025 01:37 |
Last Modified: | 31 Jul 2025 01:37 |
URI: | http://repository.its.ac.id/id/eprint/123831 |
Actions (login required)
![]() |
View Item |