Sistem Diarisasi Pembicara Menggunakan Ekstraksi Fitur Visual Geometry Group Speaker Recognition Dengan Pengelompokkan Suara Menggunakan Spectral Clustering Dan Agglomerative Hierarchical Clustering

purba, azka muhammad radinka (2022) Sistem Diarisasi Pembicara Menggunakan Ekstraksi Fitur Visual Geometry Group Speaker Recognition Dengan Pengelompokkan Suara Menggunakan Spectral Clustering Dan Agglomerative Hierarchical Clustering. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 06211840000063-Undergraduate_Thesis.pdf]

Text
06211840000063-Undergraduate_Thesis.pdf
Restricted to Repository staff only
Download (3MB) | Request a copy

Abstract

Data rekaman suara memiliki banyak informasi yang terkandung di dalamnya seperti identitas pembicara, percakapan antara pembicara dan informasi kapan seorang pembicara berbicara. Penarikan informasi ini membutuhkan waktu dan tenaga yang besar apabila dilakukan secara manual pada data rekaman suara yang panjang. Sistem diarisasi pembicara mengatasi permasalahan ini dengan melakukan mengelompokkan potongan-potongan suara dari rekaman suara berdasarkan pembicara dan dapat menentukan waktu masing-masing pembicara berbicara. Dalam tugas akhir ini dibuat sistem diarisasi pembicara yang membandingkan metode pengelompokkan suara antara agglomerative hierarchical clustering dengan spectral clustering. Sistem diarisasi pembicara yang dibuat dievaluasi pada 600 data percakapan berisi 2 sampai dengan 7 pembicara yang dibentuk dari potongan-potongan suara pada data VoxCeleb1. Ekstraksi fitur suara yang dilakukan menggunakan model convolutional neural network visual geometry group speaker recognition yang dilatih pada data VoxCeleb2. Penentuan cluster terbaik dilakukan menggunakan silhouette coefficient. Ditambahkan sebuah skema chunk filtering pada sistem diarisasi pembicara yang bertujuan untuk membuang sampel suara yang buruk sebelum memasuki proses clustering. Didapatkan bahwa skema chunk filtering dapat menaikkan ketepatan clustering dan menurunkan diarization error rate. Kinerja metode pengelompokkan yang terbaik adalah metode pengelompokkan agglomerative hierarchical clustering. Metode agglomerative hierarchical clustering dengan skema chunk filtering memiliki rata-rata ketepatan clustering sebesar 89,67% dan diarization error rate sebesar 3,79%.
==============================================================================================================================
Lots of information can be extracted from recorded voice data such as speakers identity, conversation between speakers and the exact time stamp of when the speaker talks. Manually, lengthy recorded voice data information extraction process takes major time and energy. Speaker diarization overcame this problem by dividing pieces of recorded voice data into groups based on the identity of the speaker while specifying when the speaker talks. Speaker diarization system was made for this final year project by porforming segmentation on voice data, extracting unique voice features and clustering the extracted voice features. Convolutional neural network model called visual geometry group speaker recognition was trained on VoxCeleb2 data to extract voice feature. This study carries out comparison of the voice clustering method betweek agglomerative hierarchical clustering and spectral clustering. The speaker diarization system evaluated on 600 conversation data with 2 to 7 speakers made of VoxCeleb1 voice snippets. Best cluster was determined using Silhouette Coefficient. Chunk Filtering Scheme was added to the speaker diarization system to discard bad voice samples prior to clustering process and it was found that chunk filtering raise clustering accuracy and reducing diarization error rate. This study concludes that agglomerative hierarchical clustering using the chunk filtering scheme has the best diarization performace. The average value of clustering accuracy is 89,67% and the average diarization error rate is 3,79%.

Item Type:	Thesis (Other)
Additional Information:	RSSt 519.53 Pur s-1 2022
Uncontrolled Keywords:	Agglomerative Hierarchical Clustering. Spectral Clustering. Visual Geometry Group Speaker Recognition.
Subjects:	H Social Sciences > HA Statistics
Divisions:	Faculty of Science and Data Analytics (SCIENTICS) > Statistics > 49201-(S1) Undergraduate Thesis
Depositing User:	Mr. Marsudiyana -
Date Deposited:	11 Jun 2026 01:35
Last Modified:	11 Jun 2026 01:35
URI:	http://repository.its.ac.id/id/eprint/133713

Actions (login required)

View Item