Pembangkitan Video Mikroorganisme dengan Peningkatan Representasi Visual Berbasis Model Guided Latent Video Diffusion

Gomex, Fransco (2025) Pembangkitan Video Mikroorganisme dengan Peningkatan Representasi Visual Berbasis Model Guided Latent Video Diffusion. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5002211048-Undergraduate_Thesis.pdf]

Text
5002211048-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only
Download (37MB) | Request a copy

Abstract

Kemajuan teknologi visual telah memungkinkan berbagai proses biologis untuk divisualisasikan secara lebih menarik dan informatif melalui media video. Namun, ketersediaan media yang secara khusus membahas topik mikroorganisme masih sangat terbatas. Media yang ada pada umumnya hanya berupa citra statis mikroskopis atau rekaman langsung dari mikroskop yang sering kali sulit diamati serta kurang informatif secara visual. Oleh karena itu, penelitian Tugas Akhir ini mengusulkan sebuah model yang dinamakan dengan Guided Latent Video Diffusion atau GLaVD untuk membangkitkan video sintesis mikroorganisme dengan representasi visual yang lebih informatif dan mudah dikenali. GLaVD mempelajari proses pembangkitan video dan peningkatan representasi visual melalui pendekatan compositional learning, yaitu dengan memanfaatkan ekstraksi spatio-temporal dari video mikroorganisme dan ekstraksi spatial dari citra ilustratif mikroorganisme yang lebih representatif. Data dikumpulkan melalui web-scraping dan diujikan pada model GLaVD. Secara garis besar, model GLaVD terdiri atas modul Latent Video Diffusion Model (LVDM) yang bertugas membangkitkan video, dan modul Query-Selected Attention for Contrastive Learning in Image-to-Image Translation (QS-Attn) yang berperan dalam membantu modul LVDM dalam meningkatkan kualitas representasi visual. Model QS-Attn dilatih contrastive secara terpisah, lalu diintegrasikan ke dalam LVDM pada tahap ekstraksi spatial, di mana hasil transformasi model QS-Attn dimanfaatkan melalui dua jalur: (1) dikonkatenasi dengan input sebelum proses U-Net denoising, dan (2) diinjeksi ke dalam blok Spatial Transformers dalam U-Net sebagai informasi semantik spasial. Hasil eksperimen menunjukkan bahwa GLaVD unggul dalam aspek Visual Quality (55.80%), menunjukkan kemampuannya menghasilkan video dengan representasi visual yang lebih baik dibandingkan DynamiCrafter (44.20%).
================================================================================================================================
The advancement of visual technology has enabled various biological processes to be visualized in a more engaging and informative through video media. However, resources that specifically focus on microorganisms remain very limited. Available media materials are generally restricted to static microscopic images or laboratory-recorded videos from microscopes, which are difficult to observe and lack visual clarity also visually uninformative. To address this issue, this undergraduate thesis proposes a model called Guided Latent Video Diffusion (GLaVD) for generating synthetic microorganism videos with improved visual representation. GLaVD learns the video generation process and enhances visual representation through a compositional learning approach, by leveraging spatio-temporal feature extraction from microorganism videos and spatial feature extraction from illustrative microorganism images. The data was collected through web-scraping and used to evaluate the GLaVD model. Overall, GLaVD consists of two main modules: Latent Video Diffusion Model (LVDM), which focus on generates the videos, and Query-Selected Attention for Contrastive Learning in Image-to-Image Translation (QS-Attn), which supports the LVDM module in improving visual representation. QS-Attn model is trained contrastive separately and then integrated into LVDM during the spatial extraction stage, where its output images are used in two ways: (1) concatenated with the input before U-Net denoising, and (2) injected into the Spatial Transformer blocks within the U-Net as spatial semantic information. Experimental results demonstrate that GLaVD outperforms DynamiCrafter in terms of Visual Quality (55.80% vs. 44.20%), indicating its ability to generate videos with more representative and visually appealing outputs.

Item Type:	Thesis (Other)
Uncontrolled Keywords:	Video Mikroorganisme, Image-to-Video, Unpaired Image Translation, Latent Video Diffusion Model, Microorganism Videos, Image-to-Video, Unpaired Image Translation, Latent Video Diffusion Model
Subjects:	Q Science > QA Mathematics > QA336 Artificial Intelligence Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science) Q Science > QH Biology > QH301 Biology Q Science > QR Microbiology
Divisions:	Faculty of Science and Data Analytics (SCIENTICS) > Mathematics > 44201-(S1) Undergraduate Thesis
Depositing User:	Fransco Gomex
Date Deposited:	01 Aug 2025 06:20
Last Modified:	01 Aug 2025 06:20
URI:	http://repository.its.ac.id/id/eprint/125666

Actions (login required)

View Item