Bidirectional Joint Representation Learning untuk Pengenalan Emosi pada Data Audio-Teks

Samuel, Kevin Davi (2024) Bidirectional Joint Representation Learning untuk Pengenalan Emosi pada Data Audio-Teks. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 05111940000157-Undergraduate_Thesis.pdf]

Text
05111940000157-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2026.
Download (2MB) | Request a copy

Abstract

Analisis sentimen adalah sebuah teknik atau cara yang digunakan untuk mengidentifikasi bagaimana sebuah sentimen diekspresikan menggunakan teks dan bagaimana sentimen tersebut bisa dikategorikan sebagai sentimen positif maupun sentimen negatif. Dengan penggunaan analisis sentimen, sentimen masyarakat tentang isu-isu atau topik-topik tertentu dapat diperoleh dan dianalisis dari data yang ada di internet. Penelitian analisis sentimen telah berkembang secara pesat dan telah menarik perhatian luas oleh akademisi dan industri. Pengaplikasiannya bisa dilihat pada banyak bidang; terjadi di hampir semua hal di mana terjadi interaksi manusia. Namun, sering kali penelitian yang dilakukan pada saat ini berfokus pada satu modalitas, padahal penelitian yang menggunakan dua modalitas atau lebih (multimodalitas) dapat memberikan hasil yang lebih akurat karena terdapat lebih dari satu faktor penentu. Dataset yang digunakan pada penelitian ini adalah Crowd-sourced Emotional Multimodal Actors Dataset (CREMA-D), Interactive Emotional Dyadic Motion Capture (IEMOCAP), dan CMU Multimodal Opinion Sentimen and Emotion Intensity (CMU-MOSEI). Lebih lanjut, karena manusia menunjukkan emosi melalui suara dan teks tertulis, analisis sentimen dilakukan dengan memanfaatkan semua modalitas secara bersamaan atau sendiri-sendiri. Namun, informasi di dunia nyata biasanya datang dengan modalitas yang terpisah, sehingga dibutuhkan penyelesaian masalah yang dapat mengatasi lebih dari satu modalitas secara sekaligus, atau biasa disebut dengan multimodal. Untuk menjawab permasalahan tersebut, penelitian ini menggunakan model Bidirectional Deep Neural Network dapat digunakan untuk modalitas yang berbeda yang dapat memudahkan pengguna. Pada penelitian ini juga digunakan classifier digunakan untuk pengkategorian emosi berdasarkan fitur multimodal untuk memproses hasil dari Bi-DNN untuk mendapatkan model hasil analisis sentimen. Evaluasi dilakukan dengan membandingkan metode usulan dengan beberapa metode yang telah ada melalui pengukuran akurasi, precision, recall, dan F1-score. Di mana model usulan tertinggi didapatkan pada dataset CMU MOSEI dengan ADASYN dengan model Multi-Layer Perceptron memiliki rata-rata akurasi sebesar 64,00%, precision sebesar 52,00%, recall sebesar 64,00%, dan F1-score sebesar 55,00%, di mana perform aini mengungguli metode ensemble pada semua matrik evaluasi dengan selisih akurasi sebesar 9,00% dan F1-score sebesar 22,80% . Performa ini juga mengungguli metode pembanding M-ELMo + NN pada metrik akurasi dan recall, dengan selisih akurasi 11,22%, dan selisih recall 22,60.
=================================================================================================================================
Sentiment analysis is a technique used to identify how sentiments are expressed through text and how these sentiments can be categorized as either positive or negative. With the use of sentiment analysis, public sentiments about specific issues or topics can be obtained and analyzed from data available on the internet. Research in sentiment analysis has rapidly developed and garnered widespread attention from both academia and industry. Its applications are evident across various fields, encompassing almost all aspects of human interaction. However, current research often focuses on a single modality, whereas studies employing two or more modalities (multimodal) can yield more accurate results due to the presence of multiple determining factors. The datasets used in this study include the Crowdsourced Emotional Multimodal Actors Dataset (CREMA-D), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI). Furthermore, since humans express emotions through both voice and written text, sentiment analysis is conducted by utilizing all modalities either jointly or individually. However, real-world information usually comes in separate modalities, necessitating a problem-solving approach that can handle more than one modality simultaneously, commonly referred to as multimodal. To address this issue, the study employs a Bidirectional Deep Neural Network model which can be used for different modalities, facilitating user experience. Additionally, a classifier is used for categorizing emotions based on multimodal features to process the results from the Bi-DNN to obtain the sentiment analysis model outcome. The evaluation is carried out by comparing the proposed method with several existing methods through measurements of accuracy, precision, recall, and F1- score. The proposed model achieves its highest performance on the CMU MOSEI dataset with ADASYN and a Multi-Layer Perceptron, showing an average accuracy of 64.00%, precision of 52.00%, recall of 64.00%, and F1-score of 55.00%. This performance surpasses that of the ensemble method in all evaluation metrics, with an accuracy difference of 9.00% and an F1-score difference of 22.80%. This performance also exceeds the M-ELMo + NN comparative method in terms of accuracy and recall, with an accuracy difference of 11.22%, and a recall difference of 22.60%.

Item Type:	Thesis (Other)
Uncontrolled Keywords:	Audio Recognition, Deep Learning, Multimodal, Sentimen Analysis, Analisis Sentimen, Pengenalan Suara
Subjects:	T Technology > T Technology (General) > T11 Technical writing. Scientific Writing
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User:	Kevin Davi Samuel
Date Deposited:	19 Feb 2024 03:58
Last Modified:	19 Feb 2024 03:58
URI:	http://repository.its.ac.id/id/eprint/107153

Actions (login required)

View Item