Pengembangan Aplikasi Speech To Text Untuk Notulensi Seminar Berbasis Website Dengan Pendekatan Deep Learning

Laksana, Shavica Ihya Qurata Ayun (2024) Pengembangan Aplikasi Speech To Text Untuk Notulensi Seminar Berbasis Website Dengan Pendekatan Deep Learning. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 05311940000013-Undergraduate_Thesis.pdf] Text
05311940000013-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2026.

Download (3MB) | Request a copy

Abstract

Speech recognition adalah teknologi yang mengubah ucapan manusia menjadi teks secara otomatis. Konsep ini telah menjadi landasan bagi berbagai aplikasi yang memanfaatkan interaksi suara manusia, seperti pengenalan ucapan, asisten virtual, dan sistem navigasi suara. Penerapan speech recognition sangat luas dalam berbagai industri dan bidang. Dengan mengenali dan memahami perintah suara pengguna, asisten virtual dapat memberikan informasi, menjalankan tugas, dan memberikan pengalaman pengguna yang lebih interaktif. Tidak hanya dalam lingkup bisnis dan teknologi, speech recognition juga memberikan manfaat besar dalam bidang aksesibilitas. Bagi individu dengan gangguan pendengaran atau disabilitas yang membatasi kemampuan bicara mereka, teknologi ini membuka pintu akses ke berbagai layanan dan kesempatan yang sebelumnya sulit dijangkau. Meskipun demikian, speech recognition juga memiliki tantangan tersendiri. Salah satu tantangan utamanya adalah mengenali variasi dalam pengucapan dan aksen yang berbeda-beda, serta mengatasi gangguan latar belakang yang dapat mempengaruhi kualitas pengenalan suara. Pada penelitian ini dilakukan pengembangan aplikasi speech-to-text untuk notulensi seminar berbasis website pendekatan deep learning menggunakan model Wav2Vec2 menggunakan dataset Mozilla Common Voice 13. Penelitian ini akan dapat membantu masyarakat untuk melakukan notulensi dalam seminar dan membantu penyandang disabilitas (tuna rungu) untuk mengetahui informasi yang bersumber dari media suara. Berdasarkan hasil pengujiannya, didapatkan bahwa jumlah kata benar sebanyak 232 kata dan jumlah kata salah sebanyak 11 kata dari total 243 kata yang terbagi dalam 10 kalimat uji coba. Dari total 243 kata yang diuji coba diperoleh persentase jumlah kata benar adalah 95.47% dan persentase jumlah kata salah adalah 4.53%. Hasil menunjukan model cukup baik untuk melakukan transkripsi teks dalam Bahasa Indonesia.
=================================================================================================================================
Speech recognition is a technology that converts human speech into text automatically. This concept has become the basis for various applications that utilize human voice interaction, such as speech recognition, virtual assistants, and voice navigation systems. The application of voice recognition is very wide in various industries and fields. By recognizing and understanding a user's voice commands, virtual assistants can provide information, perform tasks, and provide a more interactive user experience. Not only in the business and technology sphere, voice recognition also provides great benefits in the field of accessibility. For individuals with hearing loss or disabilities that limit their ability to speak, this technology opens the door to access to services and opportunities that were previously difficult to reach. However, voice recognition also has its own challenges. One of the main challenges is recognizing variations in pronunciation and different accents, as well as dealing with background noise that can affect the quality of speech recognition. In this research, a speech-to-text application was developed for website-based seminar minutes with a deep learning approach using the Wav2Vec2 model using the Mozilla Common Voice 13 dataset. This research will be able to help people take minutes at seminars and help people with disabilities (deaf-impaired) to find out information sourced from sound media. Based on the test results, it was found that the number of correct words was 232 words and the number of incorrect words was 11 words out of a comprehensive total of 243 words in the 10 test sentences. From a total of 243 words tested, the percentage of correct words was 95.47% and the percentage of incorrect words was 4.53%. The results show that the model is good enough to transcribe text in Indonesian.

Item Type: Thesis (Other)
Uncontrolled Keywords: Notulensi, Speech-to-text, Wav2Vec2, Website, Notes
Subjects: T Technology > T Technology (General) > T57.74 Linear programming
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information Technology > 59201-(S1) Undergraduate Thesis
Depositing User: Shavica Ihya Qurata Ayun Laksana
Date Deposited: 20 Feb 2024 07:03
Last Modified: 20 Feb 2024 07:03
URI: http://repository.its.ac.id/id/eprint/107631

Actions (login required)

View Item View Item