Syaifuddiin, Gus Nanang (2015) Position Text Graph Dan Peran Semantik Kata Dalam Pemilihan Kalimat Representatif Cluster Pada Peringkasan Multi-Dokumen. Masters thesis, Institut Teknologi Sepuluh Nopember.
Preview |
Text
5113201040-Master Thesis.pdf - Published Version Download (4MB) | Preview |
Abstract
Coverage dan salient merupakan masalah utama yang menjadi perhatian
para peneliti dalam peringkasan dokumen. Pendekatan clustering mampu
memberikan coverage yang baik terhadap semua topik namun tidak memiliki
informasi-informasi yang bisa mewakili kalimat-kalimat lain (salience sentence).
Salience dapat digali dengan melihat hubungan dari satu kalimat dengan
kalimat lain yang dibangun dengan pendekatan position text graph, namun position
text graph hanya mampu menggali hubungan antar kalimat tanpa memperhatikan
peran semantik kata (“who” did “what” to “whom”, “where”, “when”, and
“how”) dalam kalimat yang dibandingkan.
Pada tesis ini kami mengusulkan sebuah metode baru strategi pemilihan
kalimat representatif cluster yang diberi nama SSID (Semantic Sentence
Information Density) dengan pendekatan position text graph dan peran semantik
kata pada peringkasan multi-dokumen. Beberapa tahapan dalam penelitian ini
adalah text preprocessing, clustering kalimat, pengurutan cluster, pemilihan
kalimat representatif cluster dan penyusunan hasil ringkasan.
Uji coba dilakukan terhadap dataset Document Understanding Conference
(DUC) 2004 Task 2. Hasil uji coba menunjukkan SSID berhasil mengatasi
kelemahan position text graph dan meningkatkan nilai korelasi ROUGE-1 dan
ROUGE-2. Nilai analisa ROUGE-1 pada proses SSID meningkat 0.85% jika
dibandingkan dengan LIGI dan 2.42% dibandingkan dengan SIDeKiCK. Pada
analisa ROUGE-2 SSID meningkat 10.33% jika dibandingkan dengan LIGI dan
meningkat 9.73% dibandingkan dengan SIDeKiCK.
===============================================================================================
Coverage and salient is the main problem to the attention of researchers in
document summarisation. Sentence clustering approach gives good coverage of all
the topics and has information that can represent other sentences (salience
sentence).
Salience can be explored by looking at the relationship from one sentence
to another sentence that was built with the approach position text graph, but the
position of text graph only explore the relationship between a sentence without
considering the role of semantic word ("who" did "what" to "whom", "where "," when "and"
how ") in the sentence being compared.
In this thesis, we propose a new method of election strategy sentence cluster
representative named SSID (Semantic Sentence Information Density) to approach
the text position and role of the semantic graph word in multi-documents
summarization. Several stages in this study: text processing, clustering sentences
with histogram-based similarity clustering, sorting cluster, selection of a
representative sentence cluster and preparation of a summary.
The test is done with the dataset Document Understanding Conference
(DUC) 2004. The results showed SSID have the higest value of the correlation in
ROUGE-1 and ROUGE-2. The value ROUGE-1 on the SSID increased 0.85%
compared with LIGI and increased 2.42% compared with the sidekick. In ROUGE-
2 SSID 10.33% when compared with LIGI and increased 9.73% compared with the
SIDeKiCK.
Item Type: | Thesis (Masters) |
---|---|
Additional Information: | RTIf 005.74 Sya p |
Uncontrolled Keywords: | peringkasan multi-dokumen, position text graph, semantic role labeling, salience dan coverage |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science. EDP Q Science > QA Mathematics > QA278 Cluster Analysis. Multivariate analysis. Correspondence analysis (Statistics) |
Divisions: | Faculty of Information Technology > Informatics Engineering > 55101-(S2) Master Thesis |
Depositing User: | Mr. Tondo Indra Nyata |
Date Deposited: | 06 Oct 2017 02:15 |
Last Modified: | 24 Aug 2018 07:06 |
URI: | http://repository.its.ac.id/id/eprint/48927 |
Actions (login required)
View Item |