Deteksi Disfluency Tipe Filled Paused pada Public Speaking Bahasa Indonesia Menggunakan Convolutional Neural Network (CNN) Berbasis Transfer Learning

Satru, Muhammad Anjotho (2022) Deteksi Disfluency Tipe Filled Paused pada Public Speaking Bahasa Indonesia Menggunakan Convolutional Neural Network (CNN) Berbasis Transfer Learning. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 07211840000014-Undergraduate_Thesis.pdf] Text
07211840000014-Undergraduate_Thesis.pdf
Restricted to Repository staff only

Download (9MB) | Request a copy

Abstract

Public speaking atau berbicara di depan umum adalah kemampuan yang harus dimiliki setiap orang. Salah satu faktor yang menentukan kemampuan public speaking adalah percaya diri. Ketika seseorang tidak memiliki kepercayaan diri, maka akan menimbulkan perasaan negatif dan merasa takut (Raja, 2017). Ketakutan akan public speaking akan berdampak pada karir seperti pemotongan gaji, dan pembatalan promosi jabatan. Selain itu, bagi pelajar, kepercayaan diri dalam melakukan public speaking berdampak terhadap peningkatan skor akademik. Riset awal yang dilakukan berupa survei dan wawancara ke mahasiswa Jawa Timur mengenai masalah yang terjadi ketika sedang melakukan public speaking. Sebanyak 81% dari 133 orang menyatakan bahwa masalah yang sering terjadi adalah gugup. Pada saat gugup seseorang cenderung melakukan disfluency pada saat public speaking. Hal ini dapat diatasi dengan latihan rutin atau mengikuti LBB public speaking. Tetapi waktu dan biaya yang digunakan untuk mengikuti LBB dirasa cukup besar. Oleh karena itu dilakukan penelitian untuk membuat model machine learning cnn untuk melakukan deteksi disfluency pada public speaking dalam bahasa indonesia yang dapat membantu pembelajaran public speaking secara mandiri. Ada beberapa metodologi yang digunakan dalam penelitian ini, yaitu akuisisi dataset, preprocessing dataset, pelabelan dataset, berbagi dataset, ekstraksi fitur, pelatihan model, pengujian model, dan analisis kinerja. Ekstraksi fitur menggunakan metode MFCC, sedangkan model pelatihan menggunakan metode transfer learning dengan 4 arsitektur model yaitu VGG16, VGG19, EfficientNetB0, dan MobileNetV2. Terdapat 2 pengujian yang dilakukan kepada model yaitu pengujian terhadap kata dan pengujian terhadap kalimat. Dari kedua pengujian tersebut didapatkan model dengan performa klasifikasi terbaik yaitu VGG16 dengan nilai F1-Score sebesar 0,701 dan rata-rata akurasi sebesar 0,911.
==============================================================================================================================
Public speaking or speaking in front of people is an ability that everyone should have. One of the factors that determine the ability of public speaking is confidence. When a person does not have self-confidence, it will cause negative feelings and feel afraid (Raja, 2017). Fear of public speaking will have an impact on careers such as salary cuts, and cancellation of promotions. In addition, for students, confidence in doing public speaking has an impact on increasing academic scores. Initial research carried out in the form of surveys and interviews with East Java students regarding problems that occurred while conducting public activities speaking. As many as 81% of 133 people stated that the problem that often occurs is nervousness. When a person is nervous, they tend to do disfluency during public speaking. This can be overcome by regular practice or taking part in public speaking course. However, the time and cost involved in participating in the course is considered quite large. Therefore, a research was conducted to create a CNN machine learning model to detect disfluency in public speaking in Indonesian which can help independent public speaking learning. There are several methodologies used in this study, namely dataset acquisition, dataset preprocessing, dataset labeling, dataset sharing, feature extraction, model training, model testing, and performance analysis. The feature extraction uses the MFCC method, while the training model uses the transfer learning method with 4 model architectures, namely VGG16, VGG19, EfficientNetB0, and MobileNetV2. There are 2 tests carried out on the model, namely testing of words and testing of sentences. From the two tests, the model with the best classification performance was VGG16 with an F1-Score value of 0.701 and an average accuracy of 0.911.

Item Type: Thesis (Other)
Additional Information: RSKom 006.312 Sat d-1 2022
Uncontrolled Keywords: Public Speaking, Disfluency, Machine Learning.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science. EDP
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Computer Engineering > 90243-(S1) Undergraduate Thesis
Depositing User: Mr. Marsudiyana -
Date Deposited: 15 Jun 2026 08:34
Last Modified: 15 Jun 2026 08:34
URI: http://repository.its.ac.id/id/eprint/133823

Actions (login required)

View Item View Item