Pengembangan Hybrid Pipeline untuk Konversi Teks Naratif Berita Menjadi Format Prosedural dengan Pendekatan Constraint-Based LLM

Sahda, Lathifah (2026) Pengembangan Hybrid Pipeline untuk Konversi Teks Naratif Berita Menjadi Format Prosedural dengan Pendekatan Constraint-Based LLM. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025221159-Undergraduate_Thesis.pdf]

Text
5025221159-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only
Download (8MB) | Request a copy

Abstract

Di era digital, konsumsi berita masih didominasi format naratif panjang yang menuntut usaha kognitif lebih dari pembaca dengan mobilitas tinggi. Konversi teks naratif berita menjadi format prosedural diperlukan untuk mempermudah pemahaman informasi, namun prosesnya menantang karena memerlukan preservasi fakta dan struktur kronologis yang akurat. Pendekatan berbasis Large Language Model (LLM) murni rentan terhadap halusinasi dan inkonsistensi format, sehingga diperlukan integrasi dengan sistem berbasis aturan. Penelitian ini mengusulkan hybrid pipeline yang menggabungkan rule-based pre-processing, constrained LLM conversion, dan rule-based post-processing. Pipeline dirancang dengan ekstraksi fitur linguistik berupa Named Entity Recognition (NER) dan analisis temporal untuk membentuk constraint yang memandu generasi tiga model LLM yaitu Groq (Llama 3.3 70B), GPT (GPT-4o-mini), dan Claude (Claude Haiku). Evaluasi pada 1.440 data poin dari 16 kategori domain menunjukkan ketiga model memenuhi target BERTScore F1 ≥0,60, dengan overall pass rate 98,5-100%. Penilaian manusia oleh evaluator menunjukkan GPT dan Claude memenuhi target rata-rata >3,5.
==================================================================================================================================
In the digital era, news consumption is still dominated by long narrative formats that demand higher cognitive effort from fast-paced readers. Converting narrative news texts into procedural formats is necessary to facilitate information comprehension, yet the process remains challenging as it requires accurate preservation of facts and chronological structure. Pure Large Language Model (LLM)-based approaches are susceptible to hallucinations and format inconsistencies, necessitating integration with rule-based systems. This research proposes a hybrid pipeline combining rule-based pre-processing, constrained LLM conversion, and rule-based post-processing. The pipeline is designed with linguistic feature extraction comprising Named Entity Recognition (NER) and temporal analysis to form constraints guiding three LLM models: Groq (Llama 3.3 70B), GPT (GPT-4o-mini), and Claude (Claude Haiku). Evaluation on 1,440 data points across 16 domain categories shows all three models meet the BERTScore F1 ≥0.60 target, with an overall pass rate of 98.5-100%. Human evaluation by evaluators shows GPT and Claude meet the >3.5 average target.

Item Type:	Thesis (Other)
Uncontrolled Keywords:	Berita Naratif, Format Prosedural, Hybridpipeline, Konversi Teks, Large Language Model, Natural Language Processing, Hybridpipeline, Large Language Model, Narrative News, Natural Language Processing, Procedural Format, Text Conversion.
Subjects:	Q Science > QA Mathematics > QA76 Computer software
Divisions:	Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User:	Lathifah Sahda
Date Deposited:	25 Jun 2026 08:47
Last Modified:	25 Jun 2026 08:47
URI:	http://repository.its.ac.id/id/eprint/134080

Actions (login required)

View Item