Integrasi Efficient Transformer ke dalam Enhanced Group Convolutional Neural Network untuk Meningkatkan Kinerja Single Image Super-Resolution

Rosyidan, Fikri Yoma (2025) Integrasi Efficient Transformer ke dalam Enhanced Group Convolutional Neural Network untuk Meningkatkan Kinerja Single Image Super-Resolution. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6026232019-Master_Thesis.pdf]

Text
6026232019-Master_Thesis.pdf
Download (15MB)

Abstract

Single Image Super-Resolution (SISR) merupakan salah satu tantangan krusial dalam pengolahan citra digital, di mana tujuan utamanya adalah mengubah citra beresolusi rendah menjadi citra beresolusi tinggi dengan detail yang tajam dan akurasi visual yang tinggi. Di era digital saat ini, kebutuhan akan citra berkualitas tinggi sangat mendesak di berbagai bidang, mulai dari diagnosa medis yang membutuhkan interpretasi detail citra MRI dan CT scan, hingga aplikasi pengawasan keamanan dan hiburan digital yang mengandalkan ketajaman visual. Meskipun banyak pendekatan telah dikembangkan dengan menggunakan Convolutional Neural Networks (CNN) seperti Enhanced Group Convolutional Neural Network (EGCNN), keterbatasan dalam menangkap hubungan jarak jauh antar piksel menjadi penghambat dalam mencapai hasil rekonstruksi yang optimal. Di sisi lain, perkembangan Transformer dengan mekanisme self-attention telah membuka peluang baru untuk memodelkan konteks global, namun kompleksitas komputasinya sering kali menjadi kendala. Untuk itu, penelitian ini mengusulkan integrasi Efficient Transformer (ET) ke dalam arsitektur EGCNN guna mengatasi kekurangan tersebut dan menghasilkan model SISR yang tidak hanya mampu merekonstruksi detail tekstur halus dan pola global, tetapi juga efisien secara komputasi. Pendekatan yang diusulkan memanfaatkan kekuatan EGCNN dalam mengekstraksi fitur lokal melalui group convolution, yang kemudian diperkaya dengan konteks global melalui mekanisme efficient self-attention pada ET. Output fitur yang telah diproses diubah menjadi bentuk spasial kembali melalui tahap rekonstruksi dan upsampling, sehingga menghasilkan citra beresolusi tinggi yang mendekati ground truth. Evaluasi dilakukan secara menyeluruh dengan menggunakan metrik kuantitatif seperti Peak Signal-to-Noise Ratio (PSNR) dan Structural Similarity Index (SSIM) pada dataset standar pengujian seperti Set5, Set14, BSD100, dan Urban100, serta didukung oleh analisis kualitatif. Hasil penelitian menunjukkan bahwa integrasi ET secara signifikan meningkatkan performa model dibandingkan dengan EGCNN baseline, khususnya pada skala pembesaran ×2, dan menawarkan alternatif kompetitif yang efisien dibandingkan model-model state-of-the-art seperti SwinIR.
============================================================================================================================================
Single Image Super-Resolution (SISR) is one of the critical challenges in digital image processing, aiming to transform low-resolution images into high-resolution images with sharp details and high visual accuracy. In today's digital era, the demand for high-quality images is increasingly urgent across various fields, from medical diagnosis which requires detailed interpretation of MRI and CT scan images to security surveillance and digital entertainment that rely on visual sharpness. Although many approaches have been developed using Convolutional Neural Networks (CNNs), such as the Enhanced Group Convolutional Neural Network (EGCNN), their limitation in capturing long-range pixel relationships has hindered achieving optimal reconstruction results. On the other hand, the advent of Transformers with self-attention mechanisms has opened new opportunities for modeling global context, although their computational complexity often poses challenges. Therefore, this research proposes the integration of Efficient Transformer (ET) into the EGCNN architecture to overcome these shortcomings and produce a SISR model that not only reconstructs fine texture details and global patterns but is also computationally efficient. The proposed approach leverages the strength of EGCNN in extracting local features through group convolution, which is then enriched with global context via the efficient self-attention mechanism in ET. The output features processed by ET are transformed back into a spatial representation through reconstruction and upsampling stages, resulting in high-resolution images that closely approximate the ground truth. The model is evaluated comprehensively using quantitative metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) on standard testing datasets including Set5, Set14, BSD100, and Urban100, complemented by qualitative analysis. The research findings indicate that the integration of ET significantly enhances model performance compared to the baseline EGCNN, particularly at a scaling factor of ×2, and offers a competitive, efficient alternative to state-of-the-art models like SwinIR.

Item Type:	Thesis (Masters)
Uncontrolled Keywords:	Single Image Super-Resolution, Enhanced Group Convolutional Neural Network, Efficient Transformer
Subjects:	Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines. Q Science > Q Science (General) > Q337.5 Pattern recognition systems Q Science > QA Mathematics > QA336 Artificial Intelligence Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science) T Technology > T Technology (General) > T57.5 Data Processing
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information System > 59101-(S2) Master Thesis
Depositing User:	Fikri Yoma Rosyidan
Date Deposited:	24 Jun 2025 09:14
Last Modified:	24 Jun 2025 09:14
URI:	http://repository.its.ac.id/id/eprint/119249

Actions (login required)

View Item