Ricardo, Glenn (2025) Analisis Pengkondisian Fitur log-f0 CWT Emosi Netral ke Marah dan Sedih Menggunakan Metode Convolutional Autoencoder dan Seq2Seq. Masters thesis, Institut Teknologi Sepuluh November.
Text
6022231025-Master_Thesis.pdf - Accepted Version Restricted to Repository staff only until 1 April 2027. Download (19MB) | Request a copy |
Abstract
Pada tesis ini, penulis merancang arsitektur deep learning spesifik yang digunakan untuk mengkondisikan intonasi marah dan sedih dari emosi netral.Untuk melatih arsitektur,penulis menggunakan dataset Emotional Speech Database yang menyediakan data mentah berupa rekaman suara berlabel yang dibutuhkan penulis secara spesifik. Dengan data yang tersedia, penulis kemudian melakukan ekstraksi fitur log-f0 Continous Wavelet Transform (CWT) sebagai representasi intonasi. Dengan menggunakan fitur yang berhasil diekstraksi kemudian penulis dapat memandang fitur sebagai data berbasis urutan maupun data gambar bila dilakukan modifikasi. Berdasarkan natur data ini, penulis kemudian mengusulkan arsitektur Seq2Seq dan Convolutional Autoencoder untuk dilatih. Selain itu, penulis juga melatih alat ukur berupa Speech Emotion Recognition untuk melihat performa pengkondisian emosi. Semua arsitektur yang dilatih ini kemudian diuji pada tiga dataset, yaitu: ESD, RAVDESS, dan CREMA-D untuk menilai performa dan kemampuan arsitektur dalam melakukan generalisasi. Hasil pengujian yang dilakukan penulis menunjukkan bahwa secara performa dan generalisasi
arsitektur Seq2Seq lebih unggul dari Convolutional Encoder.
=================================================================================================================================
In this thesis, the author designs a specific deep learning architecture to condition angry and sad intonations from neutral emotions. To train the architecture, the author uses the Emotional Speech Database, which provides raw data in the form of labeled voice recordings that are specifically required by the author. Using the available data, the author then performs feature extraction of log-f0 using Continuous Wavelet Transform (CWT) as an intonation representation. The extracted features are then viewed as either sequence-based data or image-based data if modifications are applied. Based
on the nature of this data, the author proposes Seq2Seq and Convolutional Autoencoder architectures for training. Additionally, the author develops a Speech Emotion Recognition tool to measure the performance of emotion conditioning. All trained architectures are then tested on three datasets: ESD,RAVDESS, and CREMA-D, to evaluate the performance and generalization capability of the architectures. The test results demonstrate that, in terms
of performance and generalization, the Seq2Seq architecture outperforms the Convolutional Autoencoder.
Item Type: | Thesis (Masters) |
---|---|
Uncontrolled Keywords: | log-f0 CWT, Seq2Seq, Convolutional Autoencoder, SER |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7882.P3 Pattern recognition systems T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7882.S65 Automatic speech recognition. T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7895.S65 Speech recognition systems |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Electrical Engineering > 20101-(S2) Master Thesis |
Depositing User: | Glenn Ricardo |
Date Deposited: | 23 Jan 2025 01:53 |
Last Modified: | 23 Jan 2025 01:53 |
URI: | http://repository.its.ac.id/id/eprint/116680 |
Actions (login required)
View Item |