Perbandingan Teknik Fine-Tuning terhadap Performa Stable Diffusion untuk Ilustrasi Buku melalui Penyesuaian Dataset dan Pelatihan Model

Sulistyo, Rano Noumi (2026) Perbandingan Teknik Fine-Tuning terhadap Performa Stable Diffusion untuk Ilustrasi Buku melalui Penyesuaian Dataset dan Pelatihan Model. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211185-Undergraduate_Thesis.pdf] Text
5025211185-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (2MB) | Request a copy

Abstract

Ilustrasi buku memegang peranan penting dalam meningkatkan minat baca dan memperjelas konsep bagi pembaca, namun proses pembuatan ilustrasi secara konvensional seringkali memakan waktu lama dan memerlukan keahlian artistik khusus. Model generatif Stable Diffusion menawarkan alternatif yang lebih cepat dan ekonomis untuk produksi ilustrasi. Penelitian ini bertujuan membandingkan kinerja Stable Diffusion dalam menghasilkan ilustrasi buku melalui penyesuaian dataset dan pelatihan ulang model pada data yang spesifik. Penelitian ini menerapkan teknik fine-tuning pada model Stable Diffusion dengan tiga pendekatan utama: LoRA (Low-Rank Adaptation), DreamBooth, dan Textual Inversion. Masing-masing metode diuji pada dua jenis dataset gambar: satu heterogen, yang mencakup berbagai gaya visual, dan satu homogen, khusus berisi ilustrasi buku. Evaluasi kinerja model dilakukan menggunakan metrik Fréchet Inception Distance (FID) untuk menilai realisme visual, serta Peak Signal-to-Noise Ratio (PSNR) untuk mengukur kualitas rekonstruksi gambar. Hasil eksperimen menunjukkan bahwa seluruh metode fine-tuning efektif pada Stable Diffusion, namun dengan kecenderungan yang berbeda bergantung pada karakteristik dataset. Pada dataset heterogen, metode LoRA menghasilkan performa terbaik dengan nilai Frechet Inception Distance (FID) sebesar 181,65 dan Peak Signal-to-Noise Ratio (PSNR) sebesar 8,50 dB, yang mengindikasikan kemampuan adaptasi yang lebih stabil terhadap variasi gaya ilustrasi. Sebaliknya, pada dataset homogen, metode DreamBooth menunjukkan hasil terbaik dengan nilai FID sebesar 324,31 dan PSNR sebesar 7,37 dB, menandakan konsistensi visual yang lebih baik pada data dengan gaya ilustrasi seragam. Metode Textual Inversion menunjukkan performa terendah pada kedua metrik, dengan nilai FID sebesar 203,17 dan PSNR sebesar 7,59 dB pada dataset heterogen, serta FID sebesar 354,13 dan PSNR sebesar 5,91 dB pada dataset homogen. Temuan ini menegaskan bahwa pemilihan metode fine-tuning perlu disesuaikan dengan karakteristik dataset ilustrasi buku untuk memperoleh hasil yang optimal.
============================================================================================================================
Book illustrations play a crucial role in enhancing reader engagement and comprehension, but the traditional illustration process is often slow, costly, and highly dependent on artistic skill. Generative models such as Stable Diffusion offer a new approach to accelerate illustration production at lower cost. This study aims to improve the performance of Stable Diffusion for book illustrations by adjusting dataset and retraining the model on specialized data. This research applies fine-tuning to the Stable Diffusion model using three main techniques: LoRA (Low-Rank Adaptation), DreamBooth, and Textual Inversion. Each method is tested on two types of image datasets: a heterogeneous dataset with diverse visual styles, and a homogeneous dataset comprised solely of book illustrations. Model performance is evaluated using the Fréchet Inception Distance (FID) metric to assess visual realism, and the Peak Signal-to-Noise Ratio (PSNR) to measure reconstruction quality. The experimental results indicate that all fine-tuning methods affects the performance of Stable Diffusion, with different tendencies depending on dataset characteristics. On the heterogeneous dataset, the LoRA method achieved the best performance, obtaining a Fréchet Inception Distance (FID) of 181.65 and a Peak Signal-to-Noise Ratio (PSNR) of 8.50 dB, which reflects a more stable adaptation to diverse illustration styles. In contrast, on the homogeneous dataset, the DreamBooth method produced the best results, achieving an FID of 324.31 and a PSNR of 7.37 dB, indicating stronger visual consistency for datasets with uniform illustration styles. Textual Inversion showed the lowest performance across both metrics, with an FID of 203.17 and PSNR of 7.59 dB on the heterogeneous dataset, and an FID of 354.13 and PSNR of 5.91 dB on the homogeneous dataset. These findings demonstrate that the selection of a fine-tuning method should be aligned with the characteristics of the book illustration dataset to achieve optimal results.

Item Type: Thesis (Other)
Uncontrolled Keywords: Generative AI, Stable Diffusion, ilustrasi buku, model tuning, FID, PSNR, LoRA, DreamBooth, Textual Inversion, Generative AI, Stable Diffusion, book illustration, model tuning, FID, PSNR, LoRA, DreamBooth, Textual Inversion
Subjects: T Technology > T Technology (General) > T11 Technical writing. Scientific Writing
T Technology > T Technology (General) > T385 Visualization--Technique
T Technology > T Technology (General) > T57.84 Heuristic algorithms.
T Technology > T Technology (General) > T58.8 Productivity. Efficiency
T Technology > T Technology (General) > T59.7 Human-machine systems.
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Rano Noumi Sulistyo
Date Deposited: 29 Jan 2026 04:10
Last Modified: 29 Jan 2026 04:10
URI: http://repository.its.ac.id/id/eprint/130878

Actions (login required)

View Item View Item