Identifikasi Kesamaan Pola Dokumen Teks Berdasarkan Kemunculan Term Dalam Kalimat

-, Soehardjoepri (2017) Identifikasi Kesamaan Pola Dokumen Teks Berdasarkan Kemunculan Term Dalam Kalimat. Doctoral thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 1310301002-Disertation.pdf]
Preview
Text
1310301002-Disertation.pdf - Published Version

Download (2MB) | Preview

Abstract

Disertasi ini bertujuan untuk membuat alat deteksi kesamaan pola dokumen
teks berdasarkan munculnya term di setiap kalimat dalam dokumen teks. Pola
munculnya term yang diteliti meliputi 3 skenario yaitu: pola munculnya term
pertama, pola munculnya dua term pertama, dan pola munculnya tiga term
pertama di setiap kalimat dalam dokumen teks. Hasil yang diperoleh berupa cara
identifikasi dan kesamaan pola dokumen teks dari munculnya term pertama
dengan pendekatan uji pembeda pola Kolmogorov-Smirnov (uji K-S), dari
munculnya dua term pertama dengan menghitung jarak Euclidean antara pasangan
term kedua dokumen teks sebagai alat pembeda polanya, dan dari munculnya tiga
term pertama yang pembedaannya dengan menggunakan pendekatan Bayesian
Network (BN) dan likelihood ratio test dalam dokumen teks.
Pola dokumen teks munculnya term pertama dengan pendekatan uji
Kolmogorov-Smirnov (uji K-S) diperoleh kesamaan pola sebesar 66,67% sesuai
skenario dokumen uji. Pola dokumen teks munculnya pasangan term pertama
dengan menghitung jarak Euclidean antara pasangan term kedua dokumen teks,
diperoleh kesamaan pola sebesar 93,33% sesuai skenario dokumen uji. Sedangkan
pola dokumen teks munculnya tiga term pertama dengan pendekatan Bayesian
Network (BN) dan likelihood ratio test dalam dokumen teks diperoleh 100% sama
dengan skenario. Ketiga cara pendeteksian pola tersebut terbukti telah mampu
membedakan beberapa dokumen standar yang diuji cobakan.
=================================================================
This dissertation aims to develop a similarity pattern text detection based
on the term order appearance in each sentence in the text document. Term
emergence patterns examined include three categories, i.e the pattern of the first
term emergence, the pattern of the first two terms emergence, and the pattern of
the first three terms emergence in each sentence in the text document. The result
obtained is the identification and similarity of the text document pattern from the
emergence of the first term with the Kolmogorov-Smirnov pattern differentiator
approach (KS test), from the appearance of the first two terms by calculating the
Euclidean distance between the second term pairs of the text document as a
distinguishing tool of the pattern, and from The emergence of the first three terms
of distinction by using the Bayesian Network (BN) approach and the likelihood
ratio test in text documents.
Pattern of text document the emergence of the first term with
Kolmogorov-Smirnov test approach (K-S test) obtained similar pattern of 66.67%
according to the test document scenario. The text document pattern of the
emergence of the first term pair by calculating the Euclidean distance between the
second term pair of text documents, obtained similar pattern of 93.33% according
to the test document scenario. While the text document pattern the emergence of
the first three terms with the Bayesian Network (BN) approach and the likelihood
ratio test in the text document is 100% similar to the scenario.
This dissertation has been succeeded to propose and demonstrate the work
of three main algorithms for three scenarios couple with Kolmogorov-Smirnov,
Euclidean distance, Bayesian Network and likelihood ratio test respectively to
identify and to detect the difference between some standard tested text documents.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: pola dokumen teks, munculnya term, Kolmogorov-Smirnov, jarak Euclidean, Bayesian Network, likelihood ratio test, pattern of text documents, terms of appearance, Euclidean distance
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics > QA278.5 Principal components analysis. Factor analysis. Correspondence analysis (Statistics)
Divisions: Faculty of Mathematics and Science > Statistics > 49001-(S3) PhD Thesis
Depositing User: Soehardjoe Soehardjoepri
Date Deposited: 08 Nov 2017 07:50
Last Modified: 05 Mar 2019 07:28
URI: http://repository.its.ac.id/id/eprint/43139

Actions (login required)

View Item View Item