WaveLLDM : Desain dan Pengembangan Model Difusi Laten Ringan dalam Peningkatan Kualitas dan Restorasi Suara Ucapan

Putra Santoso, Kevin (2025) WaveLLDM : Desain dan Pengembangan Model Difusi Laten Ringan dalam Peningkatan Kualitas dan Restorasi Suara Ucapan. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5027211030_Undergraduate_Thesis.pdf] Text
5027211030_Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (4MB) | Request a copy

Abstract

Audio berkualitas tinggi sangat penting dalam berbagai aplikasi, termasuk komunikasi daring, asisten virtual, dan industri multimedia. Namun, degradasi yang disebabkan oleh noise, kompresi, dan gangguan transmisi tetap menjadi tantangan besar. Meskipun model difusi telah terbukti efektif dalam restorasi audio, model ini biasanya membutuhkan sumber daya komputasi yang tinggi dan kesulitan menangani segmen yang hilang dalam durasi lebih panjang. Salah satu pendekatan berbasis model difusi yang telah diusulkan untuk mengatasi tantangan ini adalah Schrödinger Bridge, yang menunjukkan performa state-of-the-art. Namun, kinerjanya masih memiliki ruang untuk peningkatan lebih lanjut melalui optimalisasi arsitektur. Penelitian ini mengusulkan WaveLLDM (Wave Lightweight Latent Diffusion Model), sebuah arsitektur yang mengintegrasikan neural audio codec efisien dengan difusi laten untuk restorasi dan denoising audio. Berbeda dari pendekatan konvensional yang bekerja pada domain waktu atau spektral, WaveLLDM memproses audio dalam ruang laten terkompresi, mengurangi kompleksitas komputasi sambil mempertahankan kualitas rekonstruksi. Studi empiris dilakukan dengan menguji WaveLLDM menggunakan data uji Voicebank+DEMAND menunjukkan bahwa WaveLLDM mampu melakukan rekonstruksi spektrum audio dengan nilai Log-Spectral Distance (LSD) rendah (0,48 –0,60) dan adaptasi terhadap data baru dengan skor WB-PESQ (1,62–1,71) dan STOI (0,76–0,78).
=======================================================================================================================================
High-quality audio is crucial in various applications, including online communication, virtual assistants, and the multimedia industry. However, degradation caused by noise, compression, and transmission interference remains a significant challenge. Although diffusion models have proven effective in audio restoration, they typically require high computational resources and struggle with longer missing segments. One diffusion-based approach that has been proposed to address this challenge is the Schrödinger Bridge, which has demonstrated state-of-the-art performance. However, its effectiveness still leaves room for further improvement through architectural optimization. This study proposes WaveLLDM (Wave Lightweight Latent Diffusion Model), an architecture that integrates an efficient neural codec with latent diffusion for audio restoration and denoising. Unlike conventional approaches operating in the time or spectral domain, WaveLLDM processes audio in a compressed latent space, reducing computational complexity while preserving reconstruction quality. Empirical studies conducted by testing WaveLLDM on the Voicebank+DEMAND test dataset demonstrate that WaveLLDM effectively reconstructs audio spectra with low Log-Spectral Distance (LSD) values (0,48–0,60) and adapts to new data.

Item Type: Thesis (Other)
Uncontrolled Keywords: Model generatif, model difusi laten, sintesis audio, denoising audio, WaveLLDM, deep learning, efisiensi komputasi, pemrosesan audio
Subjects: Q Science > QA Mathematics > QA274.7 Markov processes--Mathematical models.
Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information Technology > 59201-(S1) Undergraduate Thesis
Depositing User: Kevin Putra Santoso
Date Deposited: 30 Jul 2025 05:36
Last Modified: 30 Jul 2025 05:36
URI: http://repository.its.ac.id/id/eprint/123163

Actions (login required)

View Item View Item