Klasifikasi Kepribadian Menggunakan Dynamic Deep Graph Convolutional Network Dengan Augmentasi Data Berbasis Large Language Model

Alghifari, Ahmad Fauzan (2025) Klasifikasi Kepribadian Menggunakan Dynamic Deep Graph Convolutional Network Dengan Augmentasi Data Berbasis Large Language Model. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025211091-Undergraduate_Thesis.pdf] Text
5025211091-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (6MB) | Request a copy

Abstract

Kepribadian, sebagai aspek krusial dalam memahami karakteristik individu di berbagai konteks sosial dan profesional, sering dikategorikan menggunakan Myers-Briggs Type Indicator (MBTI) yang mengklasifikasikan 16 tipe berdasarkan empat dimensi utama. Dengan ketersediaan melimpah data teks pada platform daring, seperti media sosial dan forum, deteksi kepribadian secara otomatis melalui machine learning dan deep learning telah menjadi area penelitian yang menarik, di mana Tugas Akhir ini secara spesifik mengimplementasikan Dynamic Deep Graph Convolutional Network (D-DGCN). Fokus utama penelitian ini adalah mengatasi tantangan ketidakseimbangan data yang sering ditemui dalam deteksi kepribadian berbasis teks, sebuah kondisi di mana beberapa tipe kepribadian memiliki jumlah data contoh yang terbatas, sehingga dapat menghambat kemampuan generalisasi model. Untuk menanggulangi isu krusial ini, penelitian secara mendalam mengeksplorasi pengaruh augmentasi data sintetis, khususnya melalui pemanfaatan Large Language Model (LLM) dengan teknik prompt yang dirancang untuk melakukan parafrase pada data teks dari kelas-kelas yang tidak seimbang. Metode augmentasi berbasis LLM dengan parafrase ini kemudian akan dibandingkan secara komprehensif dengan pendekatan augmentasi lain, termasuk penggunaan LLM dengan strategi manipulasi prompt yang berbeda serta augmentasi yang dilakukan dengan library TextAttack. Hasil eksperimen awal secara signifikan menunjukkan bahwa metode augmentasi data yang difokuskan (LLM dengan prompt parafrase) berhasil meningkatkan generalisasi model D-DGCN pada setiap dimensi kepribadian MBTI yang diprediksi secara terpisah (Introversion vs. Extroversion, Sensing vs. Intuition, Thinking vs. Feeling, dan Perception vs. Judging), dengan mencapai rata-rata F1-score sebesar 0,91778, recall sebesar 0,92229, dan akurasi rata-rata sebesar 0,92449.
======================================================================================================================================
Personality, as a crucial aspect in understanding individual characteristics across various social and professional contexts, is often categorized using the Myers-Briggs Type Indicator (MBTI) , which classifies 16 types based on four main dimensions. With the abundant availability of text data on online platforms, such as social media and forums, automatic personality detection through machine learning and deep learning has become an interesting research area, in which this Final Project specifically implements the Dynamic Deep Graph Convolutional Network (D-DGCN). The main focus of this research is to address the challenge of data imbalance often encountered in text-based personality detection, a condition where some personality types have a limited number of data samples, which can hinder the model's generalization ability. To tackle this crucial issue, the research deeply explores the influence of synthetic data augmentation, specifically through the utilization of Large Language Models (LLM) with prompt engineering techniques designed to paraphrase imbalanced text data. This LLM-based augmentation method with paraphrasing will then be comprehensively compared with other augmentation approaches, including the use of LLMs with different prompt manipulation strategies and augmentation performed with the TextAttack library. Initial experimental results significantly show that the focused data augmentation method (LLM with prompt paraphrasing) successfully improved the generalization of the D-DGCN model on each MBTI personality dimension predicted separately (Introversion vs. Extroversion, Sensing vs. Intuition, Thinking vs. Feeling, and Perception vs. Judging), achieving an average F1-score of 0,91778, recall of 0,92229, and accuracy of 0,92449.

Item Type: Thesis (Other)
Uncontrolled Keywords: Deep Learning, MBTI, Klasifikasi, NLP, Deep Graph Convolutional Network, Augmentasi Data, LLM, Ketidakseimbangan Data, Deep Learning, MBTI, Classification, NLP, Deep Graph Convolutional Network, Data Augmentation, LLM, Data Imbalance
Subjects: R Medicine > R Medicine (General) > R858 Deep Learning
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Ahmad Fauzan Alghifari
Date Deposited: 26 Jul 2025 13:44
Last Modified: 26 Jul 2025 13:44
URI: http://repository.its.ac.id/id/eprint/121665

Actions (login required)

View Item View Item