Sugiyanto, Sugiyanto (2025) Deteksi Depresi Berbasis Multimodal Fitur Menggunakan CNN-WoPL. Doctoral thesis, Institut Teknologi Sepuluh Nopember.
Text
07111960010006-Disertasi.pdf - Accepted Version Restricted to Repository staff only until 1 April 2027. Download (6MB) | Request a copy |
Abstract
Depresi adalah gangguan mental umum yang berdampak besar pada kehidupan, seperti pekerjaan, pendidikan, dan hubungan pribadi, serta meningkatkan risiko bunuh diri hingga lima kali lipat. Metode diagnostik konvensional bergantung pada pelaporan diri dan penilaian dokter, sehingga diperlukan deteksi yang lebih obyektif dan cepat untuk mencegah depresi lebih lanjut melalui perawatan berkelanjutan. Sesuai permasalahan tersebut, kami mengusulkan CNN-Poolingless Framework dan Multi-Input CNN-WoPL untuk mendeteksi depresi otomatis dengan obyektif dan lebih cepat. CNN Kerangka kerja CNN-Poolingless diajukan untuk mendeteksi depresi berdasarkan fitur tunggal berupa intensitas Action Unit (AU), yang diambil dari 14 AU dalam ekspresi wajah. Metode berbasis CNN tanpa Pooling Layer, dioptimalkan dengan GridSearch untuk hyperparameter terbaik (filter = 512, batch size = 64, optimizer = SGD, learning rate = 0.001), menunjukkan hasil yang menjanjikan pada dataset DAIC WOZ, dengan uji ketahanan pada dataset CASME II. Metode ini mencapai akurasi 0,988, loss 0,031, dan F1-Score 0,991, unggul dibandingkan model lain dan penggunaan pooling layer. Kerangka kerja Multi-Input CNN-WoPL, merupakan perbaikan metode CNNPoolingless dengan menambahkan 1 2D Convolution Layer. Kerangka kerja ini menggunakan fitur multi-input - AU dan eye gaze - untuk meningkatkan akurasi dan ketangguhan dalam pendeteksian depresi secara otomatis. Optimalisasi model menggunakan algoritma GridSearch untuk mendapatkan nilai hyperparameter terbaik, antara lain: jumlah filter (512), batch size (64), optimizer (Adam), dan learning rate (0.0001). Uji ketahanan model dilakukan dengan menggunakan dataset DAIC WOZ dengan 16 variasi data, data noise, data augmentasi (teknik shift dan scale), dan missing data. Dari hasil eksperimen, pengujian model, dan perbandingan dengan metode referensi, kerangka kerja Multi-Input CNN-WoPL menunjukkan kinerja tebaik akurasi 0.994 dan F1-Score 0.993. Nilai F1-Score yang mendekati 1.0 membuktikan bahwa metode yang diusulkan memiliki precision, recall dan kinerja yang baik.
===================================================================================================================================
Depression is a common mental disorder that has a significant impact on life, including work, education, and personal relationships, and increases the risk of suicide fivefold. Conventional diagnostic methods rely on self-report and physician judgment, so more objective and rapid detection is needed to prevent further depression through ongoing treatment. According to these problems, we propose CNN-Poolingless Framework and Multi-Input CNN-WoPL for automatic depression detection objectively and faster. A CNN-Poolingless depression classification framework is proposed based on a single Action Unit (AU) intensity feature extracted from 14 facial AUs. The poolingless CNN-based method, optimized with GridSearch for the best hyperparameters (filter 512, batch size 64, optimizer SGD, learning rate 0.001), showed promising results on the DAIC WOz dataset, with robustness tests on the CASME II dataset. The method achieved accuracy of 0.988, loss of 0.031, and F1 score of 0.991, superior to other models and the use of pooling layers. The Multi-Input CNN-WoPL framework is an improvement of the CNNPoolingless method by adding 1 convolutional layer. The framework uses multi-input features - AU and gaze - to improve accuracy and robustness in automatic depression detection. The model optimization uses the GridSearch algorithm to obtain the best hyperparameter values, including: number of filters (512), batch size (64), optimizer (Adam), and learning rate (0.0001). Model robustness tests were conducted using the DAIC WOz dataset with 16 data variations, noise data, augmentation data (shift and scale techniques), and missing data. From the experimental results, model tests, and comparison with reference methods, the Multi-Input CNN-WoPL framework showed the best performance with an accuracy of 0.994 and an F1 score of 0.993. F1-Score values close to 1.0 prove that the proposed method has good precision, recall, and performance.
Item Type: | Thesis (Doctoral) |
---|---|
Uncontrolled Keywords: | Depresi, Multi-Input, CNN, Pooling Layer, Action Unit, optimasi, Depression, action units, eye gaze, optimization |
Subjects: | T Technology > T Technology (General) > T57.5 Data Processing T Technology > T Technology (General) > T58.5 Information technology. IT--Auditing |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Electrical Engineering > 20001-(S3) PhD Thesis |
Depositing User: | Sugiyanto Sugiyanto |
Date Deposited: | 10 Jan 2025 03:23 |
Last Modified: | 10 Jan 2025 03:23 |
URI: | http://repository.its.ac.id/id/eprint/116236 |
Actions (login required)
View Item |