Pembangkitan Deskripsi Gambar Berbahasa Indonesia Menggunakan Metode BLIP (Bootstrapping Language-Image Pre-training)

Azhar, Anas (2025) Pembangkitan Deskripsi Gambar Berbahasa Indonesia Menggunakan Metode BLIP (Bootstrapping Language-Image Pre-training). Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211043-Undergraduate_Thesis.pdf]

Text
5025211043-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only
Download (9MB) | Request a copy

Abstract

Gambar merupakan media penting dalam menyampaikan informasi visual secara cepat dan efisien. Namun, individu tunanetra menghadapi tantangan dalam memahami informasi visual secara langsung, sehingga diperlukan teknologi pendukung untuk membantu mereka. Salah satu teknologi yang berkembang pesat saat ini adalah image captioning, yang mampu menerjemahkan gambar visual menjadi teks deskripsi. Penelitian ini bertujuan untuk membantu tunanetra memahami informasi visual melalui pengembangan model image captioning yang dapat menghasilkan teks deskripsi berbahasa Indonesia dari gambar dan nantinya bisa diubah ke dalam bentuk suara menggunakan teknologi text-to-speech. Penelitian ini memanfaatkan metode Bootstrapping Language-Image Pre-training (BLIP), sebuah kerangka kerja Vision-Language Pre-training (VLP) yang fleksibel untuk tugas pemahaman maupun pembangkitan teks dari gambar. Model BLIP dikembangkan menggunakan dataset orisinal yang dikumpulkan secara mandiri di area trotoar dan lingkungan kampus Institut Teknologi Sepuluh Nopember. Evaluasi dilakukan menggunakan metrik BLEU dan ROUGE_L untuk melihat kesesuaian antara teks deskripsi yang dihasilkan model dengan teks deskripsi referensi. Hasil terbaik dalam penelitian ini diperoleh dari model BLIP dengan parameter optimizer Adafactor, scheduler CossineAnnealingLR, batch size 2 untuk pelatihan, batch size 4 untuk validasi, serta learning rate 1x{10}^{-5} dan dilatih secara penuh (full fine-tuning), dengan nilai evaluasi tertinggi pada metrik BLEU-1, BLEU-2, BLEU-3 dan BLEU-4 masing-masing sebesar 33,82%, 23,81%, 17,49% dan 13,00%. Sedangkan pada metrik ROUGE-L, diperoleh nilai sebesar 37,02%.
======================================================================================================================================
Images are an important medium for conveying visual information quickly and efficiently. However, blind individuals face challenges in understanding visual information directly, so supporting technology is needed to help them. One of the rapidly developing technologies today is image captioning, which is able to translate visual images into descriptive text. This study aims to help blind people understand visual information through the development of an image captioning model that can produce Indonesian language descriptive text from images and can later be converted into sound using text-to-speech technology. This study utilizes the Bootstrapping Language-Image Pre-training (BLIP) method, a flexible Vision-Language Pre-training (VLP) framework for the task of understanding and generating text from images. The BLIP model was developed using an original dataset collected independently on the sidewalk and campus environment of the Sepuluh Nopember Institute of Technology. The evaluation was carried out using the BLEU and ROUGE_L metrics to see the suitability between the descriptive text generated by the model and the reference descriptive text. The best results in this study were obtained from the BLIP model with Adafactor optimizer parameters, CossineAnnealingLR scheduler, batch size 2 for training, batch size 4 for validation, and learning rate 1x{10}^{-5} and fully trained (full fine-tuning), with the highest evaluation values on BLEU-1, BLEU-2, BLEU-3 and BLEU-4 metrics of 33.82%, 23.81%, 17.49% and 13.00% respectively. While on the ROUGE-L metric, a value of 37.02% was obtained.

Item Type:	Thesis (Other)
Uncontrolled Keywords:	BLIP, Image Captioning, BLEU, ROUGE-L, BLIP, Image Captioning, BLEU, ROUGE-L
Subjects:	T Technology > T Technology (General)
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User:	Anaz Azhar
Date Deposited:	28 Jul 2025 10:34
Last Modified:	28 Jul 2025 10:34
URI:	http://repository.its.ac.id/id/eprint/122320

Actions (login required)

View Item