Pengembangan Semantic Role Labeling Untuk Deteksi Kejadian Penting pada Data Sosial Media Berbahasa Indonesia

Ariyanto, Amelia Devi Putri (2025) Pengembangan Semantic Role Labeling Untuk Deteksi Kejadian Penting pada Data Sosial Media Berbahasa Indonesia. Doctoral thesis, INSTITUT TEKNOLOGI SEPULUH NOPEMBER.

[thumbnail of 7025221021-Dissertation.pdf]

Text
7025221021-Dissertation.pdf - Accepted Version
Restricted to Repository staff only
Download (23MB) | Request a copy

Abstract

Kejadian adalah bagian dari peristiwa yang terjadi pada waktu dan tempat tertentu yang melibatkan satu atau lebih orang, dan menggambarkan perubahan keadaan. Contoh kejadian seperti kebakaran, banjir, gempa bumi, dan kecelakaan memenuhi kriteria tersebut dan tergolong sebagai kejadian penting karena berdampak signifikan terhadap kehidupan manusia, ekonomi, sosial, dan psikologis. Sosial media seperti Twitter menawarkan pendekatan baru dalam mitigasi bencana. Masyarakat secara aktif menggunakan Twitter untuk menyebarkan informasi, meminta bantuan, dan mencari kabar terkini terkait bencana di sekitarnya. Oleh karena itu, teks dari Twitter dapat dimanfaatkan oleh pihak berwenang untuk mendukung operasi penyelamatan, mengingat kecepatan responnya lebih tinggi dibandingkan platform lain seperti Facebook. Dengan demikian, sistem deteksi kejadian penting perlu dibangun secara real-time dari data Twitter, sebagai alternatif deteksi kejadian yang murah tanpa sensor. Sistem tersebut perlu mengekstrak informasi penting (lokasi kejadian, manusia dan benda yang terdampak bencana) kemudian mendeteksi tingkat keparahannya dari teks Twitter. Terdapat tantangan dalam memanfaatkan Twitter untuk operasi penyelamatan yaitu kesulitan mengidentifikasi pesan permintaan bantuan karena teks Twitter tidak terstruktur dengan tingkat keteraturan yang rendah seperti gaya penulisan, sintaksis, dan struktur. Teknologi ekstraksi informasi seperti Named Entity Recognition (NER) dan Semantic Role Labeling (SRL) muncul untuk mengubah teks Twitter menjadi terstruktur. NER mengidentifikasi entitas lokasi hanya di level toponim (nama tempat, nama jalan), sementara SRL mengenali lokasi secara tepat berdasarkan kejadian sesungguhnya melalui peran-peran semantik dalam kalimat. Peran semantik adalah peran partisipan dalam suatu kejadian. Penggunaan SRL dapat mengurangi ambiguitas teks dengan mengidentifikasi pelaku dan dampak dari kejadian, sehingga memudahkan penilaian tingkat keparahan bencana melalui struktur teks yang lebih sistematis. Penelitian ini bertujuan untuk mengembangkan metode SRL untuk melakukan ekstraksi informasi dari data media sosial berbahasa Indonesia, dan mendeteksi tingkat keparahan sebagai bagian dari proses deteksi kejadian penting. Pendekatan deep learning berbasis transformer digunakan untuk pengembangan metode SRL pada teks Twitter berbahasa Indonesia karena memiliki mekanisme self-attention yang memiliki kemampuan cukup bagus dalam menghitung representasi posisi token dalam suatu urutan sehingga dapat memberikan performa yang baik. Setelah dikenali jenis kejadian menggunakan SRL, kemudian akan dilakukan deteksi keparahan kejadian. Pendekatan fuzzy digunakan untuk mengetahui tingkat keparahan bencana karena dapat memodelkan dan menangani kekaburan dalam teks Twitter. Penelitian ini berhasil membangun dataset dan mengembangkan sistem ekstraksi informasi berbasis SRL pada teks Twitter dengan pendekatan semi- supervised learning untuk mengidentifikasi peran semantik dalam kejadian penting seperti banjir, kebakaran, gempa bumi, dan kecelakaan. Empat model transformer dievaluasi, yaitu IndoBERT berbasis BERT, IndoRoBERTa berbasis RoBERTa, GPT2-Indonesian berbasis GPT-2, dan Komodo berbasis Llama-2. Hasil evaluasi menunjukkan bahwa IndoBERT dengan threshold 0,9 memberikan performa tertinggi dengan F1-score sebesar 0,863. Peningkatan threshold terbukti mampu menyaring prediksi yang memiliki confidence rendah, sehingga meningkatkan kualitas pseudo- label dan mengurangi kesalahan klasifikasi. Informasi peran semantik dari SRL kemudian dimanfaatkan dalam sistem deteksi tingkat keparahan kejadian menggunakan Fuzzy Logic Mamdani. Sistem ini memanfaatkan tiga variabel masukan utama: jumlah korban meninggal (DEATHVICTIM-ARG), korban luka (WOUNDVICTIM-ARG), dan benda terdampak (AFFECTEDOBJECTS-ARG). Evaluasi dilakukan terhadap 1003 tweet data testing, dengan distribusi 992 tweet berlabel low, 9 medium, dan 2 high. Model fuzzy menunjukkan performa terbaik secara keseluruhan dibandingkan empat model machine learning lainnya (Support Vector Machine, K-nearest Neighbors, Decision Tree, dan Naïve Bayes), dengan precision 1.00, recall 0.81, dan F1-score 0.87 (macro average). Sistem fuzzy yang dibangun mampu mengatasi karakteristik unik teks Twitter dengan beberapa keunggulan signifikan. Pertama, sistem mampu menangani ketidakpastian data melalui pengolahan deskripsi ambigu seperti "puluhan korban" atau "ribuan rumah". Kedua, aturan fuzzy berbasis logika linguistik memungkinkan sistem mencakup berbagai skenario tanpa memerlukan dataset pelatihan yang sangat besar. Ketiga, sistem mampu memproses data tweet dalam jumlah besar dengan cepat untuk mengetahui tingkat keparahan kejadian.
===================================================================================================================================
An event involving one or more people occurs at a particular time and place and describes a change in circumstances. Events such as fires, floods, earthquakes, and accidents are classified as significant because they significantly impact human life, the economy, society, and psychology. Social media such as Twitter offers a new approach to disaster mitigation. People actively use Twitter to spread information, ask for help, and find the latest disaster news. Therefore, text from Twitter can be used by authorities to support rescue operations, considering its response speed is higher than that of other platforms such as Facebook. Thus, a critical event detection system needs to be built in real-time from Twitter data, as an alternative to cheap incident detection without sensors. The system needs to extract important information (location of the incident, people and objects affected by the disaster) and then detect its severity from Twitter text. Using Twitter for rescue operations presents challenges, particularly in identifying messages that request help. This challenge is due to the unstructured nature of Twitter text, which lacks consistency in writing style, syntax, and structure. Information extraction technologies such as Named Entity Recognition (NER) and Semantic Role Labeling (SRL) have emerged to transform Twitter text into structured text. NER identifies location entities only at the toponym level (place names, street names), while SRL recognizes locations precisely based on actual events through semantic roles in sentences. Semantic roles are the roles of participants in an event. Using SRL can reduce text ambiguity by identifying the perpetrators and impacts of the event, thus facilitating the assessment of disaster severity through a more systematic text structure. This study aims to develop an SRL method to extract information from Indonesian-language social media data and detect severity as part of the critical event detection process. The transformer-based deep learning approach is used to develop the SRL method on Indonesian-language Twitter text because it has a self-attention mechanism with a reasonably good ability to calculate the representation of token positions in a sequence to provide good performance. Once the type of event is recognized using SRL, the event's severity will be detected. The fuzzy approach is used to determine the severity of the disaster because it can model and handle the fuzziness in Twitter text.This study successfully built a dataset and developed an SRL-based information extraction system on Twitter text with a semi-supervised learning approach to identify semantic roles in important events such as floods, fires, earthquakes, and accidents. Four transformer models were assessed: IndoBERT based on BERT, IndoRoBERTa based on RoBERTa, GPT2-Indonesian based on GPT-2, and Komodo based on Llama-2. The evaluation results showed that IndoBERT with a threshold of 0.9 gave the highest performance with an F1-score of 0.863. Increasing the threshold was proven to filter predictions with low confidence, thereby improving the quality of pseudo-labels and reducing classification errors. Semantic role information from SRL was then utilized in an incident severity detection system using Mamdani Fuzzy Logic. This system utilizes three main input variables: the number of fatalities (DEATHVICTIM-ARG), injured victims (WOUNDVICTIM-ARG), and affected objects (AFFECTEDOBJECTS-ARG). The evaluation was conducted on 1003 tweets of testing data, with a distribution of 992 tweets labeled low, nine medium, and two high. The fuzzy model showed the best overall performance compared to the other four machine learning models (Support Vector Machine, K-nearest Neighbors, Decision Tree, and Naïve Bayes), with a precision of 1.00, a recall of 0.81, and an F1-score of 0.87 (macro average). The fuzzy system built can overcome the unique characteristics of Twitter text with several significant advantages. First, the system can handle data uncertainty by processing ambiguous descriptions such as "dozens of victims" or "thousands of houses". Second, fuzzy rules based on linguistic logic allow the system to cover many scenarios without requiring an extensive training dataset. Third, the system can quickly process large amounts of tweet data to determine the incident's severity.

Item Type:	Thesis (Doctoral)
Uncontrolled Keywords:	Deteksi Kejadian, Deteksi Tingkat Keparahan, Named Entity Recognition, Semantic Role Labeling, Fuzzy Logic, Event Detection, Severity Detection, Named Entity Recognition, Semantic Role Labeling, Fuzzy Logic
Subjects:	T Technology > T Technology (General) T Technology > T Technology (General) > T57.5 Data Processing
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Computer Engineering
Depositing User:	Amelia Devi Putri Ariyanto
Date Deposited:	01 Aug 2025 04:40
Last Modified:	01 Aug 2025 07:03
URI:	http://repository.its.ac.id/id/eprint/125674

Actions (login required)

View Item