Perbandingan Metode Text Mining dan Large Language Models Pada Kasus Pemeringkasan Teks

-, Ronaldyanto (2024) Perbandingan Metode Text Mining dan Large Language Models Pada Kasus Pemeringkasan Teks. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5002201139_Undergraduate_Thesis.pdf] Text
5002201139_Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2026.

Download (1MB) | Request a copy

Abstract

Untuk meningkatkan pemahaman suatu artikel dengan cepat, pembaca seringkali membutuhkan ringkasan dari artikel tersebut atau text summarization. Terdapat beberapa metode untuk melakukan text summarization, seperti text mining dan large language models (LLMs). Metode-metode ini memiliki kekurangan dan kelebihannya masing-masing. Oleh sebab itu, penelitian Tugas Akhir ini bertujuan untuk mengekplorasi
performansi dua metode text mining yaitu term frequency-inverse document frequency (tf-Idf), dan generate similar (Gensim), dan LLMs yaitu generative pre-trained transformer 2 (GPT-2), text-to-text transfer transformer (T5) dan text-to-text transfer retentive (T4R). Pada penelitian ini, kelima metode diuji pada tiga dataset umum tentang pemeringkasan artikel berbahasa Inggris dan Indonesia. Dari hasil eksperimen, ditemukan bahwa model GPT-2 memiliki hasil ROUGE yang bagus dikarenakan membangkitkan kalimat hampir seperti yang aslinya. Sedangkan, model T5 memiliki nilai BLEU yang paling tinggi dan yang terlihat paling mampu meringkas kalimat dibandingkan metode yang lainnya.
================================================================
To enhance the rapid comprehension of an article, readers often require a summary of the article, known as text summarization. There are several methods for performing text
summarization, such as text mining and large language models (LLMs). Each of these methods has its own advantages and disadvantages. Therefore, this study aims to explore
the performances of text mining methods and LLMs on text summarization. This study implements two text mining methods: term frequency-inverse document frequency (tf-Idf), and generate similar (Gensim), and three LLMs: generative pre-trained transformer 2 (GPT-2), text-to-text transfer transformer (T5), and text-to-text transfer retentive
(T4R). In this study, the five methods were tested on three public text summarization datasets within English and Indonesian Languages. From the experimental results, it
was found that the GPT-2 model has good ROUGE scores because it generates sentences almost identical to the original ones. Meanwhile, the T5 model has the highest BLEU score and appears to be the most capable of summarizing sentences compared to other methods.

Item Type: Thesis (Other)
Uncontrolled Keywords: Pemeringkasan artikel, text mining, large language models, text summarization.
Subjects: Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
Divisions: Faculty of Science and Data Analytics (SCIENTICS) > Mathematics > 44201-(S1) Undergraduate Thesis
Depositing User: Ronaldyanto Ronaldyanto
Date Deposited: 06 Aug 2024 07:13
Last Modified: 06 Aug 2024 07:13
URI: http://repository.its.ac.id/id/eprint/113877

Actions (login required)

View Item View Item