Peringkasan Kalimat Menggunakan Multi-Stage Cluster Chunking Bidirectional Autoregressive Transformer

Aidha, Febrina Nur (2025) Peringkasan Kalimat Menggunakan Multi-Stage Cluster Chunking Bidirectional Autoregressive Transformer. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5002211100-Undergraduate_Thesis.pdf]

Text
5002211100-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only
Download (2MB) | Request a copy

Abstract

Peringkasan kalimat sangat dibutuhkan dalam membantu pembaca memahami topik dan wawasan yang terkandung dalam suatu teks. Large Language Model seperti Model Bidirectional Autoregressive Transformer (BART) diklaim mampu melakukan peringkasan lebih baik dibandingkan model konvensional. Akan tetapi, beberapa kasus ditemukan adanya ringkasan yang kurang menangkap informasi penting atau bahkan informasi yang ditangkap tidak sesuai dengan referensi teks (hallucination). Oleh karena itu penelitian Tugas Akhir ini menggunakan sebuah model Multi-Stage Cluster Chunking Bidirectional Autoregressive Transformer (MSCC-BART) yang merupakan integrasi metode deep clustering dan model peringkasan BART. Metode deep clustering digunakan sebagai teknik chunking. Tujuan dari penelitian Tugas Akhir ini untuk menghasilkan ringkasan yang lebih menangkap informasi penting sehingga lebih sesuai dengan groundtruth. ModelMSCC-BART dievaluasi pada data publik peringkasan transkrip rapat berbahasa inggris QMSum yang memiliki groundtruth. Hasil eksperimen menunjukkan bahwa model MSCC-BART berhasil membangkitkan ringkasan lebih menangkap informasi penting dan lebih sesuai dengan groundtruth daripada model peringkasan berbasis chunking SUMMN . Diperoleh nilai yang lebih tinggi pada model MSCC-BART dengan kenaikan nilai ROUGE-1 sebesar 7,89%, ROUGE-2 sebesar 1,98%, ROUGE-L sebesar 4,49%, dan BERTScore sebesar 2,26% dibandingkan model.
======================================================================================================================================

Sentence summarization is essential to help readers understand the topics and insights contained in a text. Large Language Models such as the Bidirectional Autoregressive Transformer (BART) are claimed to be able to perform summarization better than conventional models. However, in some cases, summaries are found to not capture important information or even information that is not in accordance with the text reference (hallucination). Therefore, this study uses a Multi-Stage Cluster Chunking Bidirectional Autoregressive Transformer (MSCC-BART) model which is an integration of the deep clustering method and the BART summarization model. The deep clustering method is used as a chunking technique. The goal of this study is to produce summaries that capture more important information and thus are more in line with the groundtruth. The MSCC-BART model is evaluated on public data summarizing English language meeting transcripts QMSum which has groundtruth. The experimental results show that the MSCC-BART model successfully generates summaries that capture more important information and are more in line with the groundtruth than the chunking SUMMN -based summarization model. Higher scores were obtained in the MSCC-BART model with an increase in the ROUGE-1 score of 7,89%, ROUGE-2 score of 1,98%, ROUGE-L score of 4,49%, and BERTScore score of 2,26% compared to the model.

Item Type:	Thesis (Other)
Uncontrolled Keywords:	Bidirectional Autoregressive Transformer, Clustering, Chunk, Peringkasan Kalimat, Bidirectional Autoregressive Transformer, Clustering, Multi-Stage Summarization, Sentence Summarization
Subjects:	Q Science > QA Mathematics > QA336 Artificial Intelligence
Divisions:	Faculty of Science and Data Analytics (SCIENTICS) > Mathematics > 44201-(S1) Undergraduate Thesis
Depositing User:	Febrina Nur Aidha
Date Deposited:	01 Aug 2025 08:09
Last Modified:	01 Aug 2025 08:09
URI:	http://repository.its.ac.id/id/eprint/125881

Actions (login required)

View Item