Interpretasi Otomatis Citra Medis Otak Menggunakan Kombinasi Model Klasifikasi Berjenjang Dan Image Captioning

Mayzura, Wan Sabrina (2025) Interpretasi Otomatis Citra Medis Otak Menggunakan Kombinasi Model Klasifikasi Berjenjang Dan Image Captioning. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211023-Undergraduate_Thesis.pdf]

Text
5025211023-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only
Download (6MB) | Request a copy

Abstract

Pencitraan medis otak merupakan teknik representasi visual dari struktur dan fungsi organ otak manusia yang sangat penting dalam diagnosis berbagai kondisi neurologis. Namun, terdapat beberapa tantangan dalam proses interpretasi manual citra medis pada umumnya, seperti pekerjaan yang repetitif dan risiko kemungkinan adanya kesalahan diagnosis. Deskripsi citra medis juga menggunakan istilah klinis yang sangat spesifik, sehingga model image captioning umum yang dilatih secara general sering gagal menangkap informasi semantik penting pada citra otak. Untuk itu, penelitian ini mengembangkan pendekatan baru dengan menggabungkan klasifikasi berjenjang dan image captioning. Klasifikasi dilakukan terhadap orientasi (plane), modalitas, tipe, dan subtipe abnormalitas (tipe tumor) untuk menghasilkan kata kunci yang merepresentasikan fitur semantik citra. Kata kunci ini kemudian diintegrasikan ke dalam model VisionEncoderDecoder dengan ViT sebagai encoder dan BioMedBERT sebagai decoder, yang dipilih karena telah dilatih khusus pada teks medis. Evaluasi dilakukan terhadap 313 gambar test menggunakan dua skenario inferensi: tanpa dan dengan integrasi kata kunci klasifikasi. Hasil menunjukkan bahwa integrasi kata kunci meningkatkan skor BLEU dari 0,2979 menjadi 0,3348, yang berarti deskripsi menjadi lebih informatif. Namun, skor METEOR mengalami sedikit penurunan dari 0,1865 menjadi 0,1614 karena sensitivitas metrik terhadap beberapa hasil caption akhir yang belum terstruktur. Temuan ini membuktikan bahwa klasifikasi berjenjang dan penggunaan pretrained medical language model berkontribusi dalam memperkaya konteks visual serta meningkatkan kesesuaian informasi deskripsi.
==============================================================================================================================
Brain medical imaging is a visual representation technique of the structure and function of the human brain that plays a vital role in diagnosing various neurological conditions. However, there are several challenges in the manual interpretation process of medical images in general, such as repetitive tasks and the risk of potential diagnostic errors. Medical image descriptions also use highly specific clinical terminology, making general image captioning models often fail to capture important semantic information in brain images. Therefore, this study proposes a new approach by combining hierarchical classification and image captioning. Classification is performed on orientation (plane), modality, type, and subtype of abnormality to generate keywords that represent the image’s semantic features. These keywords are then integrated into the VisionEncoderDecoder model, using ViT as the encoder and BioMedBERT as the decoder, selected due to its specialized training on medical texts. Evaluation was conducted on 313 test images using two inference scenarios: without and with keyword integration from the classification. The results show that keyword integration increased the BLEU score from 0.2979 to 0.3348, indicating more informative descriptions. However, the METEOR score slightly decreased from 0.1865 to 0.1614 due to the metric’s sensitivity to some unstructured final captions. These findings demonstrate that hierarchical classification and the use of a pretrained medical language model contribute to enriching visual context and improving the accuracy of the generated descriptions.

Item Type:	Thesis (Other)
Uncontrolled Keywords:	Citra Otak MRI, Image Captioning, Klasifikasi Berjenjang, Pencitraan Medis, Vision Transformer (ViT), Brain MRI, Hierarchical Classification, Image Captioning, Medical Imaging, Vision Transformer (ViT).
Subjects:	Q Science > QA Mathematics > QA336 Artificial Intelligence R Medicine > R Medicine (General) > R858 Deep Learning T Technology > T Technology (General) > T57.5 Data Processing T Technology > TA Engineering (General). Civil engineering (General) > TA1637 Image processing--Digital techniques. Image analysis--Data processing.
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User:	Wan Sabrina Mayzura
Date Deposited:	14 Jul 2025 04:23
Last Modified:	14 Jul 2025 04:23
URI:	http://repository.its.ac.id/id/eprint/119601

Actions (login required)

View Item