Slamet, Joko (2025) Pengembangan Model Deteksi Code Smell Berbasis Graph Neural Network Untuk Mendeteksi Long Method, Large Class Dan Duplicated Code. Masters thesis, Institut Teknologi Sepuluh Nopember.
|
Text
6025231063-Master_Thesis.pdf - Accepted Version Restricted to Repository staff only Download (3MB) | Request a copy |
Abstract
Code smell sering kali menjadi penyebab menurunnya kualitas dan keterpeliharaan perangkat lunak. Code Smell perlu dideteksi secara dini untuk mencegah akumulasi utang teknis. Salah satu pendekatan untuk mendeteksi code smell yaitu berbasis machine learning, pendekatan ini masih memiliki keterbatasan dalam memahami struktur dan relasi kompleks dalam kode sumber. Hal ini menimbulkan kebutuhan akan metode yang lebih canggih untuk menangkap konteks struktural program secara menyeluruh. Penelitian ini mengusulkan model deteksi code smell berbasis Graph Neural Network (GNN) dengan merepresentasikan struktur kode Python ke dalam Function Call Graph (FCG) yang dibangun dari Abstract Syntax Tree (AST). Setiap simpul merepresentasikan fungsi, dan fitur node diekstraksi secara otomatis berdasarkan karakteristik struktural kode, kemudian diklasifikasikan menggunakan model Graph Convolutional Network (GCN) ke dalam kategori Long Method, Large Class, atau No Smell. Proses pelatihan menggunakan Cross Entropy Loss dan Adam Optimizer, sedangkan evaluasi dilakukan dengan metrik akurasi, precision, recall, F1-score, confusion matrix. Hasil evaluasi menunjukkan bahwa model GNN yang diusulkan secara konsisten mengungguli algoritma machine learning dalam mendeteksi code smell, baik untuk kategori Long Method maupun Large Class. Pada deteksi Long Method, model GNN mencapai akurasi sebesar 96.3%, melampaui algoritma terbaik dari machine learning, yaitu Decision Tree dengan akurasi 95.9%. Sementara itu, pada deteksi Large Class, GNN memperoleh akurasi 95.2%, juga lebih tinggi dibandingkan model machine learning terbaik, yakni Random Forest dengan akurasi 92.7%. Model ini juga mampu mengidentifikasi Long Method, Large Class, dan Duplicated Code secara efektif ketika diuji pada dua proyek open source berskala industri. Pada dataset ERPNext, model mencapai akurasi 98,07. Sementara itu, pada dataset Odoo, model memperoleh akurasi 96,65%. Temuan ini mengonfirmasi bahwa pendekatan berbasis GNN tidak hanya efektif dalam mendeteksi code smell pada program Python, tetapi juga menawarkan peningkatan performa yang signifikan dibandingkan teknik konvensional, sehingga memberikan kontribusi praktis dalam meningkatkan kualitas dan keterpeliharaan kode.
================================================================================================================================================
Code smells often lead to decreased software quality and maintainability. Early detection of code smells is essential to prevent the accumulation of technical debt. One common approach to detect code smells is through machine learning; however, this method still has limitations in understanding the complex structure and relationships within source code. This creates a need for more advanced methods capable of capturing the structural context of programs more comprehensively. This study proposes a code smell detection model based on Graph Neural Network (GNN), representing the structure of Python code as a Function Call Graph (FCG) built from the Abstract Syntax Tree (AST). Each node represents a function, and node features are automatically extracted based on structural characteristics of the code, then classified using a Graph Convolutional Network (GCN) into categories such as Long Method, Large Class, or No Smell. The training process uses Cross Entropy Loss and the Adam Optimizer, while evaluation is conducted using accuracy, precision, recall, F1-score, confusion matrix, and ROC-AUC metrics. The evaluation results show that the proposed GNN model consistently outperforms machine learning algorithms in detecting code smells, both in the Long Method and Large Class categories. For Long Method detection, the GNN model achieved an accuracy of 96.3%, surpassing the best-performing machine learning algorithm, Decision Tree, with 95.9%. For Large Class detection, the GNN achieved 95.2%, higher than the best-performing machine learning model, Random Forest, with 92.7%. The model also effectively identified Long Method, Large Class, and Duplicated Code when tested on two large-scale open-source projects. On the ERPNext dataset, the model achieved an accuracy of 98.07%, while on the Odoo dataset, it reached 96.65%. These findings confirm that the GNN-based approach is not only effective in detecting code smells in Python programs but also offers significant performance improvements over conventional techniques, contributing practically to improving code quality and maintainability.
| Item Type: | Thesis (Masters) |
|---|---|
| Uncontrolled Keywords: | Model Deteksi Code Smell, Code Smell Model Detection, Duplicated Code, Graph Neural Network, Large Class, Long Method |
| Subjects: | Q Science > QA Mathematics > QA76.758 Software engineering |
| Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis |
| Depositing User: | Joko Slamet |
| Date Deposited: | 31 Jul 2025 07:22 |
| Last Modified: | 31 Jul 2025 07:22 |
| URI: | http://repository.its.ac.id/id/eprint/124784 |
Actions (login required)
![]() |
View Item |
