Eksplorasi Image Caption Generation Pada Dataset Fashion Menggunakan Model BLIP-2

Priyanto, Akbar Putra Asenti and Pandya, Duevano Fairuz (2024) Eksplorasi Image Caption Generation Pada Dataset Fashion Menggunakan Model BLIP-2. Project Report. [s.n]. (Unpublished)

[thumbnail of 5025211004_5025211052-Project_Report.pdf] Text
5025211004_5025211052-Project_Report.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Industri fashion memiliki minat yang sangat tinggi, namun para penjual sering menghadapi tantangan dalam membantu pelanggan memilih pakaian yang sesuai dari katalog mereka. Salah satu solusi yang dapat diterapkan adalah penggunaan caption text untuk mendeskripsikan pakaian. Namun, proses pembuatan caption secara manual memerlukan waktu dan tenaga yang besar, sehingga diperlukan otomatisasi. Pada Kerja Praktik ini, telah dilakukan eksplorasi kemampuan model BLIP-2 dalam menghasilkan caption text dataset pakaian bernama FACAD. Praproses pembeda yang dilakukan antara lain adalah padding pada gambar dan balancing. Hasil pengujian menunjukkan bahwa model BLIP-2 berhasil menghasilkan caption text yang memenuhi lima dari enam kriteria evaluasi yang telah ditentukan dengan nilai metrik BLEU sebesar 0,36 dan ROUGE-L sebesar 0,4 pada model terbaik.
==================================================================================================================================
The fashion industry has a very high level of interest, but sellers often face challenges in helping customers select suitable clothing from their catalogs. One potential solution is the use of caption text to describe clothing. However, manually creating captions requires significant time and effort, necessitating automation. In this internship, the capabilities of the BLIP-2 model were explored to generate caption text for a clothing dataset named FACAD. Preprocessing steps included image padding and balancing. Testing results showed that the BLIP-2 model successfully generated caption text that met five out of six predefined evaluation criteria, achieving a BLEU metric score of 0.36 and a ROUGE-L score of 0.4 on the best model.

Item Type: Monograph (Project Report)
Uncontrolled Keywords: image caption, fashion, BLIP-2, pembangkitan deskripsi gambar, pakaian.
Subjects: T Technology > T Technology (General) > T57.5 Data Processing
T Technology > T Technology (General) > T59.7 Human-machine systems.
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Akbar Putra Asenti Priyanto
Date Deposited: 23 Dec 2024 07:26
Last Modified: 23 Dec 2024 07:26
URI: http://repository.its.ac.id/id/eprint/116039

Actions (login required)

View Item View Item