Hasanah, Miftahul (2025) Pengembangan Model Multi-Task Learning Untuk Klasifikasi Dan Pembuatan Medical Report Pada Citra Medis. Masters thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
6025231065-Master_Thesis.pdf - Accepted Version Restricted to Repository staff only Download (5MB) | Request a copy |
Abstract
Pembuatan medical report secara manual menyita waktu kerja radiolog secara signifikan, terutama ketika seorang radiolog diharuskan untuk menganalisis banyak citra medis dalam waktu yang singkat. Hal tersebut dapat meningkatkan resiko terjadinya kesalahan diagnosis akibat kelelahan dan menurunnya performa kerja radiolog. Oleh karena itu, diperlukan pengembangan sebuah model medical image captioning yang dapat membantu radiolog menghasilkan medical report secara otomatis untuk meningkatkan akurasi dan efisiensi dalam pembuatan medical report.
Penelitian ini mengembangkan sebuah framework multi-task learning dengan mengintegrasikan arsitektur co-attention mechanism, hierarchical long-short term memory (LSTM), dan multi-label classification untuk menghasilkan medical report secara otomatis, sekaligus memprediksi label penyakitnya. Model yang dikembangkan memanfaatkan pre-trained model residual network (ResNet) untuk mengekstraksi fitur visual dari gambar. Fitur vektor yang diperoleh kemudian akan diteruskan ke dalam modul multi-label classification untuk diprediksi label penyakitnya. Selanjutnya, representasi fitur visual dan embedding dari label yang diprediksi akan diintegrasikan melalui co-attention mechanism untuk memperoleh vektor konteks yang merepresentasikan kedua modalitas tersebut. Vektor konteks dari co-attention mechanism akan digunakan sebagai input modul sentence-level LSTM untuk menghasilkan vektor topik dari suatu kalimat, yang kemudian diteruskan ke word-level LSTM untuk membentuk rangkaian kata – kata yang merepresentasikan gambar. Melalui pendekatan ini, model diharapkan mampu menghasilkan medical report yang tidak hanya relevan dengan informasi visual pada citra, namun juga merepresentasikan konteks klinis dari label penyakit.
Model dilatih dan diuji pada dataset Indiana University (IU X-Ray), dengan dua skenario, yaitu: 1) dataset lengkap, dan 2) subset data dengan spesifik pulmonary disease. Selain itu, berbagai pendekatan berbasis threshold dalam modul multi-label classification diimplementasikan untuk mengevaluasi kinerja model. Metrik bilingual evaluation understudy (BLEU), metric for evaluation of translation with explicit ordering (METEOR), dan recall-oriented understudy for gisting evaluation (ROUGE) digunakan untuk menguji kualitas caption yang dihasilkan oleh model, dan Precision, Recall, dan F1-Score untuk menguji label penyakit yang diprediksi. Hasil eksperimen menunjukkan bahwa model yang diusulkan mampu melampaui kinerja beberapa penelitian terdahulu yang merupakan state-of-the-art dalam domain medical image captioning, dengan skor evaluasi sebagai berikut: BLEU-3 sebesar 0,274, BLEU-4 sebesar 0,218, METEOR sebesar 0,455, dan ROUGE sebesar 0,526.
=================================================================================================================================================================================================
Manually creating medical reports is time-consuming, especially when radiologists are required to review a large number of medical images in a short period. This can increase the risk of diagnostic error due to fatigue and the high workload of radiologists. Therefore, developing a medical image captioning model to assist radiologists in automatically generating medical reports is crucial for improving the accuracy and efficiency of medical report generation.
This research proposes a multi-task learning framework by integrating the co-attention mechanism, hierarchical long-short term memory (LSTM), and multi-label classification to generate medical reports while predicting the disease label automatically. The proposed model utilizes a pre-trained residual network (ResNet) to extract visual features from medical images. The extracted feature vectors are subsequently forwarded to the multi-label classification module to predict the associated disease labels. These predicted label embeddings are integrated with the visual features through a co-attention mechanism to obtain a context vector that captures the relationship between the two modalities. This context vector is then used as input to a sentence-level LSTM to generate a topic vector for each sentence, which is subsequently forwarded to a word-level LSTM to form a sequence of words describing the image. Through this approach, the model is expected to generate medical reports that are not only aligned with the visual information of the image but also reflect the underlying clinical context of the predicted disease labels.
The model is trained and evaluated on the Indiana University (IU X-Ray) dataset in two scenarios, including 1) a complete dataset, and 2) a subset dataset focusing on pulmonary disease. Furthermore, various threshold-based approaches in the multi-label classification module are implemented to evaluate the model's performance. The quality of generated captions is evaluated using the bilingual evaluation understudy (BLEU), metric for evaluation of translation with explicit ordering (METEOR), and recall-oriented understudy for gisting evaluation (ROUGE), while the accuracy of predicted labels is evaluated using Precision, Recall, and F1-Score. Experimental results show that the proposed model outperforms state-of-the-art methods in medical image captioning, with BLEU-3 of 0,274, BLEU-4 of 0,218, METEOR of 0,455, and ROUGE of 0,526.
Item Type: | Thesis (Masters) |
---|---|
Uncontrolled Keywords: | IU X-Ray, medical image captioning, multi-label classification, multi-task learning |
Subjects: | T Technology > TA Engineering (General). Civil engineering (General) > TA1637 Image processing--Digital techniques. Image analysis--Data processing. |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis |
Depositing User: | Miftahul Hasanah |
Date Deposited: | 30 Jul 2025 04:05 |
Last Modified: | 30 Jul 2025 04:05 |
URI: | http://repository.its.ac.id/id/eprint/123153 |
Actions (login required)
![]() |
View Item |