Klasifikasi Intent pada Percakapan Voice Assistant dengan Menggunakan Model IndoBERT

Laras, Rosyd Panjie (2024) Klasifikasi Intent pada Percakapan Voice Assistant dengan Menggunakan Model IndoBERT. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5002201112-Undergraduate_Thesis.pdf] Text
5002201112-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2026.

Download (4MB) | Request a copy

Abstract

Intent classification dan slot filling adalah dua tugas penting dalam Natural Language Understanding (NLU). Namun, keduanya sering mengalami kendala pada data pelatihan yang berskala kecil dan diberi label oleh manusia yang mengakibatkan kemampuan generalisasi yang kurang baik terutama pada kata-kata yang jarang muncul dalam dataset. Oleh karena itu, penelitian ini menerapkan model BERT (Bidirectional Encoder Representations from Transformers) versi Indonesia yaitu IndoBERT yang mampu memahami konteks kata-kata secara mendalam dengan memanfaatkan mekanisme attention memungkinkan model untuk menangkap variasi makna halus dan hubungan yang kompleks antar kata dalam kalimat. IndoBERT telah dilatih menggunakan dataset bahasa indonesia berskala besar Indo4B untuk menangani karakteristik bahasa Indonesia, termasuk struktur kalimat yang kompleks dan variasi bahasa yang kaya. Data yang digunakan pada penelitian ini adalah data teks percakapan pengguna dengan voice assistant berbahasa Indonesia berasal dari Amazon Science. Evaluasi kinerja model dilakukan dengan menggunakan metrik akurasi serta matriks Precision (P), Recall (R), dan F1-Score (F1). Hasil penelitian menunjukkan bahwa model IndoBERT-base unggul dibandingkan dengan IndoBERT-lite-base, mBERT, DistilBERT, dan GPT-2 dalam klasifikasi intent pada percakapan voice assistant berbahasa Indonesia. Model IndoBERT mencapai akurasi sebesar 86.35%, presisi 86.25%, recall 86.36%, dan F1-Score 85.94%. Dengan menggunakan IndoBERT, penelitian ini berhasil mengatasi tantangan dalam klasifikasi intent dengan lebih efektif, memberikan representasi kata yang lebih akurat dalam bahasa Indonesia. ============================================================Intent classification and slot filling are two critical tasks in Natural Language Understanding (NLU). However, both often face challenges due to small-scale and manually labeled training data, which results in poor generalization capabilities, especially for rare words in the dataset. Therefore, this study employs the Indonesian version of the BERT (Bidirectional Encoder Representations from Transformers) model, IndoBERT, which can deeply understand word contexts using the attention mechanism, enabling the model to capture subtle semantic variations and complex relationships between words in a sentence. IndoBERT has been trained using the large-scale Indonesian language dataset, Indo4B, to handle the characteristics of the Indonesian language, including its complex sentence structures and rich linguistic variations. The data used in this study consists of user text conversations with Indonesian-language voice assistants sourced from Amazon Science. The model’s performance was evaluated using accuracy and the metrics Precision (P), Recall (R), and F1-Score (F1). The results show that the IndoBERT�base model outperforms IndoBERT-lite-base, mBERT, DistilBERT, and GPT-2 in intent classification for Indonesian-language voice assistant conversations. IndoBERT achieved an accuracy of 86.35%, precision of 86.25%, recall of 86.36%, and an F1-Score of 85.94%. By using IndoBERT, this study successfully addresses the challenges in intent classification more effectively, providing more accurate word representations in Indonesian.

Item Type: Thesis (Other)
Uncontrolled Keywords: BERT, IndoBERT, Attention, Intent ============================================================BERT, IndoBERT, Attention, Intent
Subjects: Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.6 Computer programming.
Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
Divisions: Faculty of Mathematics and Science > Mathematics > 44201-(S1) Undergraduate Thesis
Depositing User: Rosyd Panjie Laras
Date Deposited: 06 Aug 2024 04:53
Last Modified: 06 Aug 2024 04:53
URI: http://repository.its.ac.id/id/eprint/113622

Actions (login required)

View Item View Item