Putra, Marsyavero Charisyah (2024) Optimasi Model Klasifikasi Multi-Label Berbahasa Indonesia: Penerapan dan Perbandingan Metode Bi-LSTM dan BERT pada Teks Berita Berbahasa Indonesia. Other thesis, Institute Teknologi Sepuluh Nopember.
Text
5025201122-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only until 1 October 2026. Download (9MB) | Request a copy |
Abstract
Pengguna internet di Indonesia terus meningkat setiap tahunnya. Pada tahun 2023, survei APJII mencatat 78,19% populasi Indonesia menggunakan internet, meningkat 2,67% dari tahun sebelumnya. Peningkatan ini mendorong pertumbuhan media online dan mengubah cara masyarakat mengakses berita dari media cetak ke online. Dengan bertambahnya pengguna internet, volume berita online juga meningkat signifikan. Diperlukan teknik efisien untuk mengelola dan mengklasifikasikan berita agar memudahkan pengguna memahami informasi yang relevan. Penelitian ini mengembangkan model multilabel classification menggunakan Bi-LSTM dan BERT serta membandingkan performanya dengan model dari penelitian sebelumnya. Dataset berasal dari scraping teks berita dari detik.com dan Petakabar, dijadikan satu dataset bernama Topic Mining. Dataset dilabeli secara manual dengan 39 label. Empat model dikembangkan: Bi-LSTM, Bi-LSTM dengan Attention, BERT, dan Hybrid (kombinasi BERT dan Bi-LSTM dengan Attention). Evaluasi kinerja model menggunakan metrik F1-Score, akurasi, recall, dan hamming loss. Model Hybrid menunjukkan performa terbaik dengan mean recall 0,864, mean F1-score 0,893, mean accuracy 0,981, dan mean Hamming Loss 0,019. Label yang berperforma terbaik adalah "bencana" dengan F1-score 0,973, sedangkan label yang berperforma terburuk adalah "politik" dengan F1-score 0,279.
==============================================================================================================================
The number of internet users in Indonesia continues to increase every year. In 2023, a survey by APJII recorded that 78.19% of Indonesia's population uses the internet, an increase of 2.67% from the previous year. This growth has driven the rapid expansion of online media and changed the way people access news from print to online. With more internet users, the volume of online news has also increased significantly. Efficient techniques are needed to manage and classify news to help users understand relevant information. This study developed a multi-label classification model using Bi-LSTM and BERT and compared its performance with the previous model from the previous research. The dataset, named Topic Mining, was created by scraping news texts from detik.com and Petakabar. The dataset was manually labeled with 39 labels. Four models were developed: Bi-LSTM, Bi-LSTM with Attention, BERT, and Hybrid (a combination of BERT and Bi-LSTM with Attention). Model performance was evaluated using F1-Score, accuracy, recall, and hamming loss. The Hybrid model showed the best performance with a mean recall of 0,864, a mean F1-score of 0,893, a mean accuracy of 0,981, and a mean hamming loss of 0,019. With the best-performing label is "disaster" with an F1-score of 0,973, while the worst performing label is "politics" with an F1-score of 0,279.
Item Type: | Thesis (Other) |
---|---|
Uncontrolled Keywords: | Berita Berbahasa Indonesia, BERT, BiLSTM, Klasifikasi Multilabel, Indonesian News Text, Multilabel Classification |
Subjects: | T Technology > T Technology (General) > T57.5 Data Processing T Technology > T Technology (General) > T57.8 Nonlinear programming. Support vector machine. Wavelets. Hidden Markov models. |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
Depositing User: | Marsyavero Charisyah Putra |
Date Deposited: | 03 Sep 2024 04:05 |
Last Modified: | 03 Sep 2024 04:05 |
URI: | http://repository.its.ac.id/id/eprint/111093 |
Actions (login required)
View Item |