Prediksi Kepribadian Berbasis Fitur Visual Menggunakan Arsitektur Swin Transformer Dan Multi-Frame Aggregation

Hasyim, Wildan Fauzy Maulana (2026) Prediksi Kepribadian Berbasis Fitur Visual Menggunakan Arsitektur Swin Transformer Dan Multi-Frame Aggregation. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025221044-Undergraduate_Thesis.pdf]

Text
5025221044-Undergraduate_Thesis.pdf
Restricted to Repository staff only
Download (7MB) | Request a copy

Abstract

Penelitian ini bertujuan mengembangkan model prediksi kepribadian berbasis fitur visual wajah menggunakan arsitektur Swin Transformer dan strategi Multi-Frame Aggregation pada dataset ChaLearn First Impression V2. Penelitian dilakukan untuk mengatasi keterbatasan pendekatan sebelumnya yang masih bergantung pada single-frame dan backbone CNN konvensional seperti ResNet18. Model yang diusulkan memanfaatkan beberapa frame representatif dari video untuk menghasilkan representasi fitur yang lebih stabil dan robust. Selain itu, penelitian juga mengintegrasikan fitur emosi, frame attention, temporal transformer, quality-weighted loss, serta face embedding berbasis ArcFace guna meningkatkan kualitas representasi wajah. Proses klasifikasi dilakukan terhadap lima dimensi kepribadian Big Five menggunakan pendekatan multi-output classification. Eksperimen dilakukan melalui beberapa tahapan pengembangan model dan reformulasi label dari empat kelas menjadi tiga kelas untuk mengurangi ambiguitas distribusi data. Hasil penelitian menunjukkan bahwa penggunaan Swin Transformer dan Multi-Frame Aggregation mampu meningkatkan stabilitas prediksi serta kinerja model dibandingkan pendekatan single-frame. Penambahan mekanisme attention dan integrasi fitur tambahan juga memberikan kontribusi terhadap peningkatan representasi visual wajah pada proses prediksi kepribadian.
=================================================================================================================================
This research aims to develop a personality prediction model based on facial visual features using the Swin Transformer architecture and Multi-Frame Aggregation strategy on the ChaLearn First Impression V2 dataset. The study was conducted to address the limitations of previous approaches that relied on single-frame input and conventional CNN backbones such as ResNet18. The proposed model utilizes several representative frames from videos to produce more stable and robust feature representations. In addition, the research integrates emotion features, frame attention, temporal transformer, quality-weighted loss, and ArcFace-based face embedding to improve facial representation quality. The classification process is carried out on the five dimensions of the Big Five personality traits using a multi-output classification approach. Experiments were conducted through several stages of model development and label reformulation from four classes into three classes to reduce ambiguity in data distribution. The results show that the use of Swin Transformer and Multi-Frame Aggregation improves prediction stability and model performance compared to single-frame approaches. The addition of attention mechanisms and integration of additional features also contribute to enhancing facial visual representations in the personality prediction process.

Item Type:	Thesis (Other)
Uncontrolled Keywords:	Analisis Wajah, Big Five Personality Traits, Deep Learning, Multi-Frame Aggregation, Prediksi Kepribadian, Swin Transformer. Big Five Personality Traits, Deep Learning, Facial Analysis, Multi-Frame Aggregation, Personality Prediction, Swin Transformer.
Subjects:	Q Science > QA Mathematics > QA336 Artificial Intelligence
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User:	WILDAN FAUZY MAULANA HASYIM
Date Deposited:	29 Jun 2026 06:54
Last Modified:	29 Jun 2026 06:54
URI:	http://repository.its.ac.id/id/eprint/134148

Actions (login required)

View Item