Maheswari, Clarissa (2025) Pemodelan Topik Percakpaan Konseling Kesehatan Mental Menggunakan Metode BERTopic. Other thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
5025211003-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only Download (9MB) | Request a copy |
Abstract
Permasalahan kesehatan mental semakin banyak diungkapkan melalui platform digital, termasuk dalam bentuk percakapan konseling daring. Percakapan ini mengandung informasi penting namun tidak terstruktur dan emosional, sehingga sulit dianalisis secara sistematis. Penelitian ini menggunakan metode BERTopic untuk mengidentifikasi tema psikologis Utama dalam percakapan konseling. Metode ini memungkinkan eksplorasi tematik berbasis unsupervised learning tanpa ketergantungan pada label manual. Fokus diberikan pada analisis topik menggunakan representasi semantik dan evaluasi kualitas hasil. Tahapan implementasi mencakup preprocessing, pembuatan embedding dokumen menggunakan SentenceTransformer, reduksi dimensi dengan UMAP, klasterisasi melalui HDBSCAN, dan ekstraksi topik menggunakan c-TF-IDF. Pengujian dilakukan terhadap dua konfigurasi preprocessing, yaitu Lematisasi Aktif (lemmatisasi aktif) dan Lematisasi Nonaktif (tanpa lemmatisasi). Evaluasi mencakup metrik kuantitatif seperti c_V, NPMI, topic diversity, dan rasio outlier, serta validasi kualitatif melalui analisis manual dan spasial. Sebagai pembanding, LDA diuji pada tiga konfigurasi topik (5, 10, 15) dengan evaluasi berbasis koherensi. Hasil evaluasi menunjukkan bahwa BERTopic menghasilkan trade-off antara granularitas topik dan stabilitas klasterisasi tergantung pada strategi preprocessing. Lematisasi Aktif menghasilkan 22 topik dengan distribusi merata, sementara Lematisasi Nonaktif menghasilkan 14 topik dengan distribusi yang didominasi satu klaster besar (328 dokumen). Konfiguras Lematisasi Nonaktif menunjukkan performa teknis yang lebih stabil dengan coherence 0,4378, NPMI -0,0586, topic diversity 0,8455, dan outlier ratio 29,28%. Metode LDA, meskipun mencatatkan skor c_V tertinggi (0,5222), menghasilkan topik dengan kata-kata yang terlalu umum dan kurang fokus secara tematik. Penelitian menyimpulkan bahwa pendekatan berbasis embedding semantik seperti BERTopic memberikan hasil topik yang lebih efektif untuk mengeksplorasi struktur tematik pada data konseling yang bersifat emosional, tidak terstruktur, dan kontekstual, di mana metrik statistik seperti c_V tidak selalu merefleksikan kualitas semantik yang sebenarnya.
=================================================================================================================================
Mental health issues are increasingly disclosed through digital platforms, including online counseling conversations. These conversations contain important information but are unstructured and emotional, making them difficult to analyze systematically. This research employs the BERTopic method to identify key psychological themes in counseling conversations. This method enables thematic exploration based on unsupervised learning without dependence on manual labeling. The focus is given to topic analysis using semantic representation and quality evaluation of results. The implementation stages include preprocessing, document embedding creation using SentenceTransformer, dimensionality reduction with UMAP, clustering through HDBSCAN, and topic extraction using c-TF-IDF. Testing was conducted on two preprocessing configurations: Lematisasi Aktif (active lemmatization) and Lematisasi Nonaktif (without lemmatization). Evaluation includes quantitative metrics such as c_V, NPMI, topic diversity, and outlier ratio, as well as qualitative validation through manual and spatial analysis. As a comparison, LDA was tested on three topic configurations (5, 10, 15) with coherence-based evaluation. The evaluation results show that BERTopic produces a trade-off between topic granularity and clustering stability depending on the preprocessing strategy. Lematisasi Aktif generated 22 topics with even distribution, while Lematisasi Nonaktif produced 14 topics with distribution dominated by one large cluster (328 documents). The Lematisasi Nonaktif configuration showed more stable technical performance with coherence 0.4378, NPMI -0.0586, topic diversity 0.8455, and outlier ratio 29.28%. The LDA method, despite recording the highest c_V score (0.5222), produced topics with words that were too general and lacked thematic focus. The research concludes that semantic embedding-based approaches like BERTopic provide more effective topic results for exploring thematic structures in counseling data that are emotional, unstructured, and contextual, where statistical metrics like c_V do not always reflect actual semantic quality.
Item Type: | Thesis (Other) |
---|---|
Uncontrolled Keywords: | BERTopic, Embedding Semantik, Kesehatan Mental, Konseling Daring, Pemodelan Topik, BERTopic, Mental Health, Online Counseling, Semantic Embedding, Topic Modelling |
Subjects: | B Philosophy. Psychology. Religion > BF Psychology P Language and Literature > P Philology. Linguistics Q Science > QA Mathematics > QA278.55 Cluster analysis Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science) Q Science > QA Mathematics > QA76.9.I52 Information visualization |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
Depositing User: | Clarissa Luna Maheswari |
Date Deposited: | 30 Jul 2025 10:03 |
Last Modified: | 30 Jul 2025 10:03 |
URI: | http://repository.its.ac.id/id/eprint/123358 |
Actions (login required)
![]() |
View Item |