Rahmawati, Yunianita (2026) Segmentasi Teks Menggunakan Multi-Layer Perceptron Berbasis Penyematan Kalimat Dan Fitur Kalimat Untuk Data Jawaban Dokter Di Sistem Konsultasi Kesehatan Online. Doctoral thesis, Institut Teknologi Sepuluh Nopember.
|
Text
7025211024-Doctoral.pdf Restricted to Repository staff only Download (4MB) | Request a copy |
Abstract
Penjelasan dalam jawaban dokter dalam konsultasi Online Health Consultation (OHC) sering kali mengalami pengulangan tema dan perubahan topik dalam satu paragraf tanpa adanya penanda transisi yang jelas, yang dapat mengganggu pemahaman komunikasi. Oleh karena itu, penting untuk mengidentifikasi batasan tema secara jelas berdasarkan aspek komunikasi dokter-pasien agar perubahan tema dapat terdeteksi dan membantu pembaca fokus terhadap inti pembahasan. Hal ini dapat dilakukan dengan membagi teks jawaban dokter menjadi segmen-segmen kecil. Penelitian ini mengusulkan pendekatan baru dalam segmentasi teks yang berfokus terhadap aspek komunikasi, berbeda dengan metode segmentasi klasik yang mengandalkan kemiripan topik. Model yang dikembangkan, yakni MLP berbasis penyematan kalimat dan fitur kalimat, menambahkan fitur kalimat yang digabungkan dengan vektor penyematan kalimat dan diterapkan ke dalam model MLP. Fitur kalimat dibangun melalui kombinasi algoritma Likelihood Ratio (LR) untuk menyaring kata-kata yang relevan dengan label tertentu, sementara Information Gain (InfoGain) digunakan untuk memilih fitur yang paling informatif. Kombinasi model terbaik yaitu MLPSentFeat+DS2+Features3, telah dioptimalkan untuk mencapai persentase kesalahan segmentasi terendah sebesar 8,2%. Hasil eksperimen menunjukkan bahwa integrasi fitur kalimat ini secara signifikan meningkatkan sensitivitas model terhadap variasi struktur linguistik, terutama dalam mengenali kalimat non-standar seperti pertanyaan yang sering muncul dalam aspek providing information dalam komunikasi medis. Selain itu, model ini terbukti efektif menangani data dengan distribusi label yang sangat tidak seimbang tanpa memerlukan teknik penyeimbangan atau data sintetis yang rumit.
========================================================================================================================
The explanation in the doctor's response during the Online Health Consultation (OHC) often involved repetition of themes and topic shifts within a single paragraph without clear transitional markers, which could disrupt communication understanding. Therefore, it was important to clearly identify the boundaries of themes based on the six aspects of doctor-patient communication so that theme changes could be detected and help readers focus on the main points. This could be done by dividing the doctor's response text into smaller segments. This study proposed a new approach to text segmentation that focused on communication aspects, differing from the classical segmentation method that relied on topic similarity. The developed model, which was an MLP based on sentence embedding and sentence features, added sentence features that were combined with sentence embedding vectors and applied into the MLP model. Sentence features were built through a combination of the Likelihood Ratio (LR) algorithm to filter words relevant to a specific label, while Information Gain (InfoGain) was used to select the most informative features. The best model combination, MLPSentFeat+DS2+Features3, was optimized to achieve the lowest segmentation error rate of 8.2%. Experimental results showed that the integration of these sentence features significantly increased the model's sensitivity to variations in linguistic structure, particularly in recognizing non-standard sentences such as questions, which frequently appeared in the information-gathering aspect of medical communication. Furthermore, the model proved effective in handling data with a highly imbalanced label distribution without the need for complex balancing techniques or synthetic data.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Uncontrolled Keywords: | Aspek Komunikasi, Fitur Kalimat, OHC, MLP berbasis Penyematan Kalimat dan Fitur Kalimat, Segmentasi Teks. Aspects of Doctor-Patient Communication, OHC, MLP based on Sentence Embedding and Sentence Features, Sentence Features, Text Segmentation. |
| Subjects: | Q Science > Q Science (General) > Q325 GMDH algorithms. |
| Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55001-(S3) PhD Thesis (Comp Science) |
| Depositing User: | Mrs. Yunianita Rahmawati |
| Date Deposited: | 22 Jan 2026 02:31 |
| Last Modified: | 22 Jan 2026 02:31 |
| URI: | http://repository.its.ac.id/id/eprint/130029 |
Actions (login required)
![]() |
View Item |
