Position Text Graph Dan Peran Semantik Kata Dalam Pemilihan Kalimat Representatif Cluster Pada Peringkasan Multi-Dokumen

Syaifuddiin, Gus Nanang (2015) Position Text Graph Dan Peran Semantik Kata Dalam Pemilihan Kalimat Representatif Cluster Pada Peringkasan Multi-Dokumen. Masters thesis, Institut Teknologi Sepuluh Nopember.

[img]
Preview
Text
5113201040-Master Thesis.pdf - Published Version

Download (4MB) | Preview

Abstract

Coverage dan salient merupakan masalah utama yang menjadi perhatian para peneliti dalam peringkasan dokumen. Pendekatan clustering mampu memberikan coverage yang baik terhadap semua topik namun tidak memiliki informasi-informasi yang bisa mewakili kalimat-kalimat lain (salience sentence). Salience dapat digali dengan melihat hubungan dari satu kalimat dengan kalimat lain yang dibangun dengan pendekatan position text graph, namun position text graph hanya mampu menggali hubungan antar kalimat tanpa memperhatikan peran semantik kata (“who” did “what” to “whom”, “where”, “when”, and “how”) dalam kalimat yang dibandingkan. Pada tesis ini kami mengusulkan sebuah metode baru strategi pemilihan kalimat representatif cluster yang diberi nama SSID (Semantic Sentence Information Density) dengan pendekatan position text graph dan peran semantik kata pada peringkasan multi-dokumen. Beberapa tahapan dalam penelitian ini adalah text preprocessing, clustering kalimat, pengurutan cluster, pemilihan kalimat representatif cluster dan penyusunan hasil ringkasan. Uji coba dilakukan terhadap dataset Document Understanding Conference (DUC) 2004 Task 2. Hasil uji coba menunjukkan SSID berhasil mengatasi kelemahan position text graph dan meningkatkan nilai korelasi ROUGE-1 dan ROUGE-2. Nilai analisa ROUGE-1 pada proses SSID meningkat 0.85% jika dibandingkan dengan LIGI dan 2.42% dibandingkan dengan SIDeKiCK. Pada analisa ROUGE-2 SSID meningkat 10.33% jika dibandingkan dengan LIGI dan meningkat 9.73% dibandingkan dengan SIDeKiCK. =============================================================================================== Coverage and salient is the main problem to the attention of researchers in document summarisation. Sentence clustering approach gives good coverage of all the topics and has information that can represent other sentences (salience sentence). Salience can be explored by looking at the relationship from one sentence to another sentence that was built with the approach position text graph, but the position of text graph only explore the relationship between a sentence without considering the role of semantic word ("who" did "what" to "whom", "where "," when "and" how ") in the sentence being compared. In this thesis, we propose a new method of election strategy sentence cluster representative named SSID (Semantic Sentence Information Density) to approach the text position and role of the semantic graph word in multi-documents summarization. Several stages in this study: text processing, clustering sentences with histogram-based similarity clustering, sorting cluster, selection of a representative sentence cluster and preparation of a summary. The test is done with the dataset Document Understanding Conference (DUC) 2004. The results showed SSID have the higest value of the correlation in ROUGE-1 and ROUGE-2. The value ROUGE-1 on the SSID increased 0.85% compared with LIGI and increased 2.42% compared with the sidekick. In ROUGE- 2 SSID 10.33% when compared with LIGI and increased 9.73% compared with the SIDeKiCK.

Item Type: Thesis (Masters)
Additional Information: RTIf 005.74 Sya p
Uncontrolled Keywords: peringkasan multi-dokumen, position text graph, semantic role labeling, salience dan coverage
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science. EDP
Q Science > QA Mathematics > QA278 Cluster Analysis. Multivariate analysis
Divisions: Faculty of Information Technology > Informatics Engineering > (S2) Master Theses
Depositing User: Mr. Tondo Indra Nyata
Date Deposited: 06 Oct 2017 02:15
Last Modified: 24 Aug 2018 07:06
URI: http://repository.its.ac.id/id/eprint/48927

Actions (login required)

View Item View Item