Gumelar, Agustinus Bimo (2025) Klasifikasi Emosi Wicara Pada Pembelajaran Mesin Tertanam Menggunakan Platform Spikeless SpiNNaker. Doctoral thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
07111960010002-Dissertation.pdf - Accepted Version Restricted to Repository staff only until 1 April 2027. Download (15MB) | Request a copy |
Abstract
Emosi memiliki peran penting dalam komunikasi dan interaksi manusia, memengaruhi tindakan, keputusan, dan hubungan sosial. Teknologi pengenalan emosi berbasis analisis wicara memungkinkan sistem untuk secara otomatis memahami kondisi emosional pengguna. Namun, sebagian besar sistem pengenalan emosi saat ini masih didominasi menggunakan arsitektur komputasi von Neumann pada CPU dan GPU tradisional dengan waktu pemrosesan yang terlalu lama, bahkan dalam hitungan jam. Penelitian ini bertujuan untuk mengatasi keterbatasan waktu pemrosesan yang terlalu lama dengan mengembangkan sistem pengenalan emosi wicara menggunakan pembelajaran mesin tertanam pada platform neuromorfis.
Pendekatan penelitian ini mencakup penerapan model Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), dan algoritma k-Nearest Neighbors (k-NN) untuk mengenali pola akustik yang berkaitan dengan emosi. Analisis dilakukan terhadap ciri-ciri akustik wicara, termasuk nada suara, intonasi, kecepatan bicara, serta fitur lainnya seperti Mel-Frequency Cepstral Coefficients (MFCC), spektral, Fast Fourier Transform (FFT), spektrogram, dan cochleagram, berdasarkan teori emosi Paul Ekman dan Robert Plutchik. Algoritma yang dikembangkan diterapkan pada platform neuromorfis SpiNNaker, yang dirancang untuk pemrosesan paralel berskala besar dengan kecepatan waktu komputasi tinggi. Penelitian ini membandingkan kinerja platform SpiNNaker dengan arsitektur von Neumann. Pada penelitian ini, SpiNNaker diberikan input dari fitur emosi wicara, tanpa menggunakan data spike, sehingga menjadi Spikeless SpiNNaker.
Hasil penelitian menunjukkan bahwa pembelajaran mesin tertanam pada platform neuromorfis SpiNNaker secara signifikan mengatasi keterbatasan sistem berbasis von Neumann. Pada arsitektur von Neumann, akurasi terbaik mencapai 90,97% dengan penggunaan 1.000 data emosi wicara, 8 kelas emosi, optimasi hyperparameter melalui Random Forest dan Bayesian, serta waktu pemrosesan tercepat 33 jam dan 20 menit. Sebaliknya, eksperimen pada platform SpiNNaker dengan 150 data, 3 kelas, 3 fitur (fitur MFCC, Fundamental Frequency, dan Linear Predictive Coding) menghasilkan akurasi 87,5%, dengan memanfaatkan 6 core dalam 1 chip yang bekerja secara paralel dalam waktu 2 detik dan nilai hyperparameter.
====================================================================================================================================
Emotions play a crucial role in human communication and interaction, influencing actions, decisions, and social relationships. Speech-based emotion recognition technology enables systems to automatically understand users' emotional states. However, most emotion recognition systems currently rely on von Neumann computing architectures using traditional CPUs and GPUs, which have excessively long processing times, even spanning hours. This research aims to address the limitation of lengthy processing times by developing a speech emotion recognition system using embedded machine learning on a neuromorphic platform. The research approach involves implementing Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and k-Nearest Neighbors (k-NN) algorithms to identify acoustic patterns related to emotions. The analysis focuses on acoustic features of speech, including pitch, intonation, speech rate, and additional features such as Mel-Frequency Cepstral Coefficients (MFCC), spectral features, Fast Fourier Transform (FFT), spectrograms, and cochleagrams, based on the emotion theories of Paul Ekman and Robert Plutchik. The algorithms were implemented on the neuromorphic SpiNNaker platform, designed for large-scale parallel processing with high computational speed. This study compares the performance of the SpiNNaker platform with the von Neumann architecture, where SpiNNaker was provided input from speech emotion features rather than spike data, making it a Spikeless SpiNNaker. The results demonstrate that embedded machine learning on the SpiNNaker neuromorphic platform significantly overcomes the limitations of von Neumann-based systems. On the von Neumann architecture, the highest accuracy achieved was 90.97% using 1,000 speech emotion data samples, 8 emotion classes, hyperparameter optimization through Random Forest and Bayesian methods, and a fastest runtime of 33 hours and 20 minutes. In contrast, experiments on the SpiNNaker platform using 150 data samples, 3 emotion classes, and 3 features (MFCC, Fundamental Frequency, and Linear Predictive Coding) achieved 87.5% accuracy, leveraging 6 cores on a single chip working in parallel within 2 seconds and a maximum k hyperparameter value of 12. Keywords: Emotion Classification, Human Speech Data, Neuromorphic Platform, k Nearest Neighbors, SpiNNaker
Item Type: | Thesis (Doctoral) |
---|---|
Uncontrolled Keywords: | Klasifikasi Emosi, Data Wicara Manusia, Platform Neuromorfis, k-Nearest Neighbors, SpiNNaker, Emotion Classification, Human Speech Data, Neuromorphic Platform, k-Nearest Neighbors, SpiNNaker |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7895.S65 Speech recognition systems |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Electrical Engineering > 20001-(S3) PhD Thesis |
Depositing User: | Agustinus Bimo Gumelar |
Date Deposited: | 27 Jan 2025 05:51 |
Last Modified: | 27 Jan 2025 05:51 |
URI: | http://repository.its.ac.id/id/eprint/116960 |
Actions (login required)
![]() |
View Item |