Klasifikasi Teks Medis Berbasis BioMedBERT Melalui Penguatan BiLSTM Cross-Attention

Buntoro, Ghulam Asrofi (2025) Klasifikasi Teks Medis Berbasis BioMedBERT Melalui Penguatan BiLSTM Cross-Attention. Doctoral thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 7022201009-Doctoral.pdf] Text
7022201009-Doctoral.pdf - Accepted Version
Restricted to Repository staff only

Download (9MB) | Request a copy

Abstract

Pertumbuhan pesat literatur medis memunculkan tantangan dalam penge- lolaan dan klasifikasi abstrak medis secara otomatis. Tantangan utama dalam Natural Language Processing (NLP) medis meliputi kompleksitas semantik domain, keterbatasan generalisasi model, serta ketidakseimbangan distribusi kelas. Penelitian ini bertujuan mengembangkan kerangka kerja NLP yang akurat dan robust untuk klasifikasi abstrak medis pada data tidak seimbang. Penelitian ini mengusulkan pendekatan terpadu yang mengintegrasikan transformer spesifik domain, arsitektur hibrid, dan strategi penanganan keti- dakseimbangan data. BioMedBERT digunakan sebagai model dasar berbasis domain medis, yang dikombinasikan dengan mekanisme Cross-Attention dan BiLSTM untuk menangkap konteks global dan dependensi sekuensial pada teks medis. Selain itu, strategi hybrid balancing dan super-ensemble learning diterapkan untuk meningkatkan sensitivitas terhadap kelas minoritas dan stabilitas prediksi. Evaluasi performa dilakukan menggunakan metrik yang sesuai untuk data tidak seimbang, yaitu Akurasi, Macro-F1, Recall, dan Matthews Correlation Coe�cient (MCC). Hasil eksperimen menunjukkan bahwa arsitektur hibrid BioMedBERT dengan Cross-Attention dan BiLSTM mencapai F1-score sebesar 63.82%, melampaui pendekatan transformer linear, multilayer perceptron, dan metode zero-shot. Penerapan hybrid balancing dan super-ensemble menghasilkan kinerja terbaik dengan Akurasi 64.25%, Macro-F1 64.12%, dan Recall 71.09%, serta menunjukkan stabilitas kinerja yang lebih tinggi pada berbagai skenario pembagian data. Temuan ini membuktikan bahwa integrasi spesifik domain, arsitektur hibrid, dan ensemble learning secara signifikan meningkatkan akurasi dan robustness model klasifikasi teks medis.
==================================================================================================================================
The rapid growth of medical literature presents challenges in the automatic management and classification of medical abstracts. Key challenges in medical Natural Language Processing (NLP) include the complexity of domain seman- tics, limitations in model generalization, and imbalanced class distribution. This research aims to develop an accurate and robust NLP framework for medical abstract classification on imbalanced data. This study proposes an integrated approach that integrates domain-specific transformers, a hybrid architecture, and a data imbalance handling strategy. BioMedBERT is used as a baseline model based on the medical domain, combined with Cross-Attention and BiLSTM mechanisms to capture global context and sequential dependencies in medical text. Furthermore, hybrid balancing and super-ensemble learning strategies are applied to improve sensitivity to minority classes and predictive stability. Performance evaluation is performed using metrics appropriate for imbalanced data, namely Accuracy, Macro-F1, Recall, and Matthews Correlation Coe�cient (MCC). Experimental results show that the hybrid architecture of BioMedBERT with Cross-Attention and BiLSTM achieves an F1-score of 63.82%, outperfor- ming the linear transformer, multilayer perceptron, and zero-shot approaches. The hybrid balancing and super-ensemble implementation yielded the best performance with 64.25% Accuracy, 64.12% Macro-F1, and 71.09% Recall, and demonstrated higher performance stability across various data sharing scenarios. These findings demonstrate that the integration of domain-specific, hybrid architecture, and ensemble learning significantly improves the accuracy and robustness of medical text classification models.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: BioMedBERT, BiLSTM, Cross-Attention, Data-Tak Seim- bang, Ensemble Learning, Klasifikasi Teks Medis, BioMedBERT, BiLSTM, Cross-Attention, Class Imbalance, Ensemble Learning, Medical Text Classification.
Subjects: T Technology > T Technology (General)
T Technology > T Technology (General) > T385 Visualization--Technique
T Technology > T Technology (General) > T57.5 Data Processing
T Technology > T Technology (General) > T58.5 Information technology. IT--Auditing
T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5105.546 Computer algorithms
Divisions: Faculty of Electrical Technology > Electrical Engineering > 20001-(S3) PhD Thesis
Depositing User: Ghulam Asrofi Buntoro
Date Deposited: 06 Jan 2026 06:02
Last Modified: 06 Jan 2026 06:02
URI: http://repository.its.ac.id/id/eprint/129288

Actions (login required)

View Item View Item