Pembangkitan Report Chest X-Ray Berbasis Data Efficient Image Transformer (DeiT) dan Distilled-GPT2 (DistilGPT2)

Pamungkas, Ilham (2024) Pembangkitan Report Chest X-Ray Berbasis Data Efficient Image Transformer (DeiT) dan Distilled-GPT2 (DistilGPT2). Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5002201143-Undergraduate-Thesis.pdf] Text
5002201143-Undergraduate-Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2026.

Download (5MB) | Request a copy

Abstract

Dalam dunia medis, peningkatan beban kerja yang dialami oleh radiolog akibat kompleksitas data citra medis yang harus diinterpretasi dalam melakukan diagnosis penyakit semakin meningkat. Hal ini mendorong perlunya solusi yang dapat membantu mengurangi beban kerja radiolog dalam menginterpretasi citra medis melalui pembangkitan laporan secara otomatis yang dikenal sebagai medical image captioning. Terdapat beberapa pendekatan dalam memgimplementasikan medical image captioning, seperti berbasis template (template-based) dan berbasis pencarian (retrieval-based). Namun, penelitian dengan pendekatan-pendekatan tersebut masih sering menghadapi tantangan dalam ketepatan mendeteksi kelainan dan mempertahankan gaya bahasa yang sesuai dalam laporan. Pendekatan lain yang dapat digunakan adalah model generatif atau deep neural network. Salah satu teknik yang sering digunakan dalam pendekatan tersebut adalah berbasis encoder-decoder. Beberapa penelitian terdahulu menggunakan pendekatan encoder-decoder dalam mengekstraksi fitur, seperti M2 Transformer dan Convolutional Vision Transformer (CvT) telah memberikan hasil positif, tetapi masih menunjukkan kendala dalam identifikasi kelainan yang sebenarnya tidak ada. Oleh karena itu, penelitian Tugas Akhir ini mengusulkan sebuah model berbasis encoder-decoder dengan memanfaatkan keunggulan model Data-efficient Image Transformer (DeiT) dalam ekstraksi fitur citra yang efisien dan kemampuan model DistilGPT2 dalam menghasilkan
laporan bahasa alami manusia. Penelitian Tugas Akhir ini berfokus pada integrasi model DeiT dan DistilGPT2 pada kasus pembangkitan laporan chest X-Ray yang merupakan bagian dari radiologi. Model yang dihasilkan diuji menggunakan dataset publik Indiana University Chest X-Ray Collection (IU X-Ray). Skor metrik dari eksperimen terbaik yang dihasilkan dalam penelitian Tugas Akhir ini adalah 48.00% BLEU-1, 31.34% BLEU-2, 22.66% BLEU-3, 17.38% BLEU-4, 29.85% average BLEU, dan 0.3954 ROUGE-L dengan penggunaan strategi max pooling pada fitur citra CXR dan model DeiT-base-distilled sebagai encoder.
Kata kunci: Medical image captioning, chest X-Ray, Data-efficient Image Transformer, DistilGPT2
============================================================================================================================================================================================
In the medical field, the increasing workload experienced by radiologists due to the complexity of medical imaging data that must be interpreted for disease diagnosis is growing. This drives the need for solutions that can help reduce the workload of radiologists in interpreting medical images through the automatic generation of reports, known as medical image captioning. There are several approaches to implementing medical image captioning, such as template-based and retrieval-based. However, research using these approaches often faces challenges in accurately detecting abnormalities and maintaining an appropriate language style in the reports. Another approach that can be used is generative models or deep neural networks. One technique frequently used in this approach is based on encoder-decoder models. Some previous research using the encoder-decoder approach in feature extraction, such as the M2 Transformer and Convolutional Vision Transformer (CvT), has yielded positive results but still shows limitations in identifying false positives. Therefore, this study proposes an encoder-decoder-based model that leverages the strengths of the Data-efficient Image Transformer (DeiT) in efficient image feature extraction and the capabilities of the DistilGPT2 model in
generating human-like natural language reports. This study focuses on the integration of the DeiT and DistilGPT2 models for generating chest X-ray reports, a subset of radiology.
The proposed model is tested using the public Indiana University Chest X-Ray Collection (IU X-Ray) dataset. The best experimental metrics achieved in this study are 48.00% BLEU-1, 31.34% BLEU-2, 22.66% BLEU-3, 17.38% BLEU-4, 29.85% average BLEU, and 0.3954 ROUGE-L using the max pooling strategy on CXR image features and the DeiT-base-distilled model as the encoder.
Keywords: Medical image captioning, chest X-Ray, Data-efficient Image Transformer, DistilGPT2

Item Type: Thesis (Other)
Uncontrolled Keywords: Medical image captioning, chest X-Ray, Data-efficient Image Transformer, DistilGPT2, Medical image captioning, chest X-Ray, Data-efficient Image Transformer, DistilGPT2
Subjects: Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.6 Computer programming.
Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
Divisions: Faculty of Mathematics and Science > Mathematics > 44201-(S1) Undergraduate Thesis
Depositing User: Ilham Dirgantara Laksana Pamungkas
Date Deposited: 06 Aug 2024 18:51
Last Modified: 06 Aug 2024 18:51
URI: http://repository.its.ac.id/id/eprint/111679

Actions (login required)

View Item View Item