Pembobotan Kalimat berdasarkan Fitur Berita, Informasi Gramatikal dan Relevansi Kalimat terhadap Judul untuk Peringkasan Multi-dokumen Berita

Abdullah, Moch Zawaruddin (2018) Pembobotan Kalimat berdasarkan Fitur Berita, Informasi Gramatikal dan Relevansi Kalimat terhadap Judul untuk Peringkasan Multi-dokumen Berita. Masters thesis, Institut Teknologi Sepuluh Nopember.

[img] Text
5116201027-Masters_Thesis.pdf - Published Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Pembobotan kalimat merupakan tahapan yang sering digunakan dalam peringkasan dokumen, tak terkecuali dokumen berita. Dalam peringkasan dokumen berita, metode pembobotan kalimat untuk menentukan kalimat representative sebagian besar menggunakan fitur dari berita itu sendiri seperti word frequency, Term Frequency-Inverse Document Frequency (TF-IDF), posisi kalimat, dan kemiripan kalimat terhadap judul. Metode ini mampu memilih kalimat representative dalam peringkasan dokumen. Akan tetapi metode pembobotan kalimat berdasarkan fitur berita tidak cukup, karena metode dengan fitur tersebut mengabaikan kata informatif dalam kalimat dan hanya mengukur relevansi kalimat dengan judul berdasarkan kesamaan kata. Penelitian ini bertujuan untuk melakukan peringkasan multi dokumen berita menggunakan metode pembobotan berdasarkan fitur penting berita dengan pendekatan informasi gramatikal (gramatical information) dan relevansi kalimat terhadap judul. Informasi gramatikal digunakan untuk mengindikasikan kata informatif dalam suatu kalimat. Sedangkan relevansi kalimat terhadap judul ditujukan untuk mengetahui tingkat keterhubungan kalimat terhadap judul baik dalam konteks kesamaan kata maupun kesamaan makna kata. Pembobotan kalimat berdasarkan kombinasi antara fitur berita dengan informasi gramatikal dan relevansi kalimat terhadap judul diharapkan mampu memilih kalimat representative secara lebih baik dan mampu meningkatkan kualitas hasil ringkasan. Pada penelitian ini terdapat 4 tahapan yang dilakukan untuk menghasilkan ringkasan multi-dokumen berita antara lain seleksi berita, text preprocessing, sentence scoring, dan tahap penyusunan ringkasan. Untuk mengukur hasil ringkasan menggunakan metode evaluasi Recall-Oriented Understudy for Gisting Evaluation (ROUGE) dengan empat varian fungsi yaitu ROUGE-1, ROUGE-2, ROUGE-L dan ROUGE-SU4. Hasil eksperimen pada 11 kelompok dokumen berita Indonesia pada metode yang diusulkan dibandingkan dengan metode pembobotan dengan pendekatan trending issue (NeFTIS). Metode yang diusulkan mencapai hasil yang lebih baik dibandingkan metode NeFTIS dengan peningkatan nilai untuk ROUGE-1, ROUGE-2, ROUGE-L, dan ROUGE-SU4 secara berturut-turut adalah 58%, 99.32%, 13.53%, 82.65%. ============= Sentence weighting is frequent used stages in the document summary. In the news document summary, sentence weighting methods for determining representative sentences mostly used features of the news itself such as word frequency, Term Frequency-Inverse Document Frequency (TF-IDF), sentence position, and resemblance to the title. The methods are adequate for selecting representative sentences in the document summary. However, sentence weighting methods based on news features are not sufficient because it ignores the informative word in the sentence and only measures the relevance of sentence with the title based on the similarity of words. This research aims to perform multi-document summaries using sentence weighting methods based on news features with grammatical information and the relevance of sentence to the title approach. Grammatical information is used to indicate the informative word in a sentence. The relevance of the sentence to the title is intended to find out the level of connectedness of the sentence to the title both in the context of the similarity of words and similarity of the word meaning. Sentence weighting based on a combination of news features with grammatical information and the relevance of sentence to the title are expected to be able selecting better representative sentences and improve the quality of the summary results. In this research, there are 4 stages to obtain news multi-document summary such as news selection, text preprocessing, sentence scoring, and forming summary. Measurement of summary results using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation method. The results of the experiment on the 11 groups of Indonesian news document are compared with those of the news features with trending issue approach method (NeFTIS). Our proposed method achieved better results with an increasing rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequently are 58%, 99.32%, 13.53%, 82.65%.

Item Type: Thesis (Masters)
Uncontrolled Keywords: peringkasan multi-dokumen, dokumen berita, pembobotan kalimat, informasi gramatikal, relevansi kalimat, document summaries, news documents, sentence weighting, grammatical information, sentence relevance
Subjects: Q Science > QA Mathematics > QA76.9.D343 Data mining
Z Bibliography. Library Science. Information Resources > ZA Information resources > Z699.5 Information storage and retrieval systems
Divisions: Faculty of Information and Communication Technology > Informatics > (S2) Master Theses
Depositing User: Abdullah Moch Zawaruddin
Date Deposited: 03 Aug 2018 03:13
Last Modified: 03 Aug 2018 03:13
URI: http://repository.its.ac.id/id/eprint/57828

Actions (login required)

View Item View Item