Average Voice Model (AVM) Sintesa Alamiah Bahasa Indonesia Berbasis Hidden Markov Models (HMM)

Lestari, Dwi Mardika (2018) Average Voice Model (AVM) Sintesa Alamiah Bahasa Indonesia Berbasis Hidden Markov Models (HMM). Undergraduate thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 2413100065_Undergraduate-Theses.pdf]

Preview

Text
2413100065_Undergraduate-Theses.pdf - Accepted Version
Download (3MB) | Preview

Abstract

Speech synthesis adalah suatu proses untuk membangkitkan suara sesuai dengan teks yang dkehendaki. Speech synthesis untuk Bahasa Indonesia sudah berkembang. Mulai dari pembuatan basis data untuk penelitian, membuat sintesa suara berdasarkan penggalan ujaran, dan sintesa suara dengan intonasi sederhana seperti berita dan pertanyaan. Akan tetapi, model yang dihasilkan hanya mencakup satu speaker saja sehingga perlu adanya model untuk menggabungkan basis data yang lebih dari satu speaker. Average voice model (AVM) adalah sebutan untuk model tersebut. Penelitian ini menghasilkan sintesa yang merupakan gabungan dari dua speaker. Hasil pengujian secara objektif dengan metode root mean square error (RMSE) dari frekuensi natural (F0) suara, diperoleh sintesa suara yang menyerupai salah satu basis data. Nilai RMSE yang dihasilkan adalah 77,2. Penggunaan fitur ekstraksi STRAIGHT menambah nilai RMSE. Metode mel-cepstral distorsion (MCD) menunjukkan bahwa suara yang dihasilkan merupakan gabungan dari semua basis data. Hasil uji objektif dengan MCD sebesar 13,1 dan 13,4.
============ Speech synthesis is a process to generate sound according to the desired text. Speech synthesis for Bahasa Indonesia has grown. Starting from making database for research, making speech synthesis based on speech fragment, and speech synthesis with simple intonation such as news and questions. However, the resulting model only includes one speaker so that we need for a model to combine databases more than one speaker. Average voice model (AVM) is the name for the model. This research produces synthesis which is a combination of two speakers. Objective test results with root mean square error (RMSE) method of natural frequency (F0), obtained speech synthesis that resembles one of the data base. The value of RMSE generated is 77,2. Use of the STRAIGHT extraction feature raise the RMSE values. The mel-cepstral distorsion (MCD) method shows that the resulting sound is a composite of all databases. The objective test result with MCD is 13,1 and 13,4.

Item Type:	Thesis (Undergraduate)
Additional Information:	RSF 621.399 Les a
Uncontrolled Keywords:	Average Voice Model, MCD, STRAIGHT
Subjects:	Q Science > QC Physics > QC174.17.M33 Markov processes T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5105.546 Computer algorithms
Divisions:	Faculty of Industrial Technology > Physics Engineering > 30201-(S1) Undergraduate Thesis
Depositing User:	Dwi Mardika Lestari
Date Deposited:	30 Apr 2018 04:57
Last Modified:	21 Jul 2020 07:19
URI:	http://repository.its.ac.id/id/eprint/51021

Actions (login required)

View Item