Analisis Komparatif Metode Pemodelan Topik Berdasarkan Data Twitter Terkait Long Covid

Ganata, Nathanael Putra (2025) Analisis Komparatif Metode Pemodelan Topik Berdasarkan Data Twitter Terkait Long Covid. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 20250703-Undergraduate_Thesis.pdf] Text
20250703-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (5MB)

Abstract

Diskursus publik mengenai Long COVID di media sosial memberikan wawasan penting untuk strategi komunikasi kesehatan dan prioritas riset. Penelitian ini melakukan analisis komparatif terhadap lima metode pemodelan topik untuk mengekstraksi tema-tema bermakna dari dataset Twitter berbahasa Inggris yang mencakup sekitar 500,000 cuitan sepanjang tahun 2022. Metode yang dievaluasi meliputi Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), Top2Vec, dan BERTopic. Kelima metode diterapkan pada korpus yang telah diproses secara sistematis dan dioptimalkan melalui hyperparameter. Evaluasi mencakup kriteria kuantitatif dan kualitatif seperti koherensi semantik, keragaman leksikal, serta interpretabilitas dan relevansi melalui penilaian manusia dan Large Language Model (LLM). Karena pertimbangan praktikalitas, evaluasi manusia dan LLM dilakukan melalui sampling topik per model yang dapat memberikan bias terhadap metode tertentu. Hasil penelitian menunjukkan bahwa BERTopic unggul dalam mengidentifikasi subtopik spesifik dan detail, sementara LDA memberikan ringkasan tematik yang paling mudah diinterpretasi. Metode lainnya menunjukkan performa bervariasi, mengilustrasikan trade-off fundamental antara detail tematik dan kemudahan interpretasi. Temuan ini mengindikasikan bahwa tidak ada metode yang optimal secara universal, sehingga pemilihan harus disesuaikan dengan tujuan analitis dan karakteristik dataset. LDA cocok untuk penelitian yang memerlukan interpretasi cepat dan mudah dikomunikasikan, sedangkan BERTopic lebih sesuai untuk analisis mendalam pada dataset teks besar dan kompleks. Studi ini memberikan panduan empiris dalam memilih metode pemodelan topik untuk analisis media sosial.
=========================================================================================================================================
Public discourse on Long COVID in social media provides crucial insights for health communication strategies and research priorities. This study conducts a comparative analysis of five topic modeling methods to extract meaningful themes from an English-language Twitter dataset comprising approximately 500,000 tweets throughout 2022. The evaluated methods include Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), Top2Vec, and BERTopic. Each algorithm was applied to a systematically preprocessed corpus with optimized hyperparameters. Evaluation employed quantitative and qualitative criteria encompassing semantic coherence, lexical diversity, as well as interpretability and relevance through human assessment and Large Language Model (LLM) evaluation. Due to practical considerations, human and LLM evaluations were conducted through sampling of topics per model, which may introduce bias toward certain methods. Results demonstrate that BERTopic excels in identifying specific and detailed subtopics, while LDA provides the most easily interpretable thematic summaries. Other methods showed varied performance, illustrating the fundamental trade-off between thematic detail and interpretability ease. Findings indicate that no method is universally optimal, requiring selection to be tailored to analytical objectives and dataset characteristics. LDA suits research requiring rapid and easily communicable interpretation, whereas BERTopic is more appropriate for in-depth analysis of large and complex text datasets. This study provides empirical guidance for topic modeling method selection in social media analysis.

Item Type: Thesis (Other)
Uncontrolled Keywords: Long COVID, topic modelling, BERTopic, Latent Dirichlet Allocation, social media analysis, pemodelan topik, analisis media sosial
Subjects: T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information System > 57201-(S1) Undergraduate Thesis
Depositing User: Nathanael Putra Ganata
Date Deposited: 04 Jul 2025 05:12
Last Modified: 08 Jul 2025 03:48
URI: http://repository.its.ac.id/id/eprint/119363

Actions (login required)

View Item View Item