Pengenalan Kata Berbasis Gerak Bibir Dengan Deep Learning

Alwali, Muhammad (2026) Pengenalan Kata Berbasis Gerak Bibir Dengan Deep Learning. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6022241013-Master-Thesis.pdf] Text
6022241013-Master-Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (24MB) | Request a copy

Abstract

Pengenalan kata berbasis gerak bibir merupakan salah satu bidang pene litian yang berkembang pesat seiring kemajuan kecerdasan buatan dan deep learning. Namun, tantangan utama yang masih dihadapi adalah variasi bentuk bibir antar individu, sudut pandang kamera, pencahayaan, serta kemiripan visual pada beberapa kata yang sering menimbulkan kesalahan klasifikasi. Penelitian ini mengusulkan pendekatan baru dengan memanfaatkan fitur inner lip contour, yaitu kontur bibir bagian dalam yang mampu menangkap dinamika artikulasi secara lebih detail dan informatif. Dataset penelitian terdiri dari 7.000 video, masing-masing berdurasi 1,5 detik dengan 14 kelas kata dan diekstraksi menjadi 45 frame dengan fitur geometris yaitu luas area inner lip contour, aspect ratio, circularity, eccentricity, dan extent. Kemudian dilakukan pemodelan berbasis deep learning dengan kombinasi CNN + LSTM, CNN + BiLSTM, CNN + GRU, dan CNN+BiGRU.CNNdigunakan untuk ekstraksi f itur spasial pada setiap frame, sedangkan LSTM/GRU menangani dependensi temporal antarframe. Hasil penelitian menunjukkan bahwa CNN+BiGRU merupakan model dengan kinerja paling optimal. Model ini mencapai akurasi pelatihan 99,50%, akurasi validasi 88,43%, serta memiliki nilai validation loss paling rendah dibandingkan model lainnya. Arsitektur bidirectional pada BiGRU membantu menangkap informasi temporal secara forward dan backward secara bersamaan sehingga mampu mengidentifikasi pola artikulasi bibir lebih efektif. Secara keseluruhan, penelitian ini membuktikan efektivitas inner lip contour sebagai fitur yang lebih spesifik dan detail dalam pengenalan kata berbasis gerak bibir dengan deep learning.
=====================================================================================================================================
Lip-based word recognition is a rapidly growing field of research in line with advances in artificial intelligence and deep learning. However, the main challenges that still need to be addressed are variations in lip shape between individuals, camera angles, lighting, and visual similarities between certain words, which often lead to classification errors. This study proposes a new approach by utilizing the inner lip contour feature, which is the inner lip contour that can capture articulation dynamics in greater detail and with more information. The research dataset consists of 7,000 videos, each lasting 1.5 seconds with 14 word classes, extracted into 45 frames with geometric features, namely the area of the inner lip contour, aspect ratio, circularity, eccentricity, and extent. Deep learning-based modeling was then performed with a combination of CNN + LSTM, CNN + BiLSTM, CNN + GRU, and CNN + BiGRU. CNN was used for spatial feature extraction in each frame, while LSTM/GRU handled temporal dependencies between frames. The results showed that CNN+BiGRU was the model with the most optimal performance. This model achieved a training accuracy of 99.50%, a validation accuracy of 88.43%, and had the lowest validation loss value compared to other models. The bidirectional architecture of BiGRU helps capture temporal information forward and backward simultaneously, enabling it to identify lip articulation patterns more effectively. Overall, this study proves the effectiveness of inner lip contour as a more specific and detailed feature in lip movement-based word recognition with deep learning

Item Type: Thesis (Masters)
Uncontrolled Keywords: Gerak Bibir, Inner Lip Contour, Deep Learning, CNN, RNN, Lip Movement, Inner Lip Contour, Deep Learning, CNN, RNN
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines.
Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
R Medicine > R Medicine (General) > R858 Deep Learning
T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7882.S65 Automatic speech recognition.
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Electrical Engineering > 20101-(S2) Master Thesis
Depositing User: Muhammad Alwali
Date Deposited: 15 Jan 2026 05:34
Last Modified: 15 Jan 2026 05:34
URI: http://repository.its.ac.id/id/eprint/129645

Actions (login required)

View Item View Item