Identifikasi Kesamaan Pola Dokumen Teks Berdasarkan Kemunculan Term Dalam Kalimat

-, Soehardjoepri (2017) Identifikasi Kesamaan Pola Dokumen Teks Berdasarkan Kemunculan Term Dalam Kalimat. Doctoral thesis, Institut Teknologi Sepuluh Nopember.

[img]
Preview
Text
1310301002-Disertation.pdf - Published Version

Download (2MB) | Preview

Abstract

Disertasi ini bertujuan untuk membuat alat deteksi kesamaan pola dokumen teks berdasarkan munculnya term di setiap kalimat dalam dokumen teks. Pola munculnya term yang diteliti meliputi 3 skenario yaitu: pola munculnya term pertama, pola munculnya dua term pertama, dan pola munculnya tiga term pertama di setiap kalimat dalam dokumen teks. Hasil yang diperoleh berupa cara identifikasi dan kesamaan pola dokumen teks dari munculnya term pertama dengan pendekatan uji pembeda pola Kolmogorov-Smirnov (uji K-S), dari munculnya dua term pertama dengan menghitung jarak Euclidean antara pasangan term kedua dokumen teks sebagai alat pembeda polanya, dan dari munculnya tiga term pertama yang pembedaannya dengan menggunakan pendekatan Bayesian Network (BN) dan likelihood ratio test dalam dokumen teks. Pola dokumen teks munculnya term pertama dengan pendekatan uji Kolmogorov-Smirnov (uji K-S) diperoleh kesamaan pola sebesar 66,67% sesuai skenario dokumen uji. Pola dokumen teks munculnya pasangan term pertama dengan menghitung jarak Euclidean antara pasangan term kedua dokumen teks, diperoleh kesamaan pola sebesar 93,33% sesuai skenario dokumen uji. Sedangkan pola dokumen teks munculnya tiga term pertama dengan pendekatan Bayesian Network (BN) dan likelihood ratio test dalam dokumen teks diperoleh 100% sama dengan skenario. Ketiga cara pendeteksian pola tersebut terbukti telah mampu membedakan beberapa dokumen standar yang diuji cobakan. ================================================================= This dissertation aims to develop a similarity pattern text detection based on the term order appearance in each sentence in the text document. Term emergence patterns examined include three categories, i.e the pattern of the first term emergence, the pattern of the first two terms emergence, and the pattern of the first three terms emergence in each sentence in the text document. The result obtained is the identification and similarity of the text document pattern from the emergence of the first term with the Kolmogorov-Smirnov pattern differentiator approach (KS test), from the appearance of the first two terms by calculating the Euclidean distance between the second term pairs of the text document as a distinguishing tool of the pattern, and from The emergence of the first three terms of distinction by using the Bayesian Network (BN) approach and the likelihood ratio test in text documents. Pattern of text document the emergence of the first term with Kolmogorov-Smirnov test approach (K-S test) obtained similar pattern of 66.67% according to the test document scenario. The text document pattern of the emergence of the first term pair by calculating the Euclidean distance between the second term pair of text documents, obtained similar pattern of 93.33% according to the test document scenario. While the text document pattern the emergence of the first three terms with the Bayesian Network (BN) approach and the likelihood ratio test in the text document is 100% similar to the scenario. This dissertation has been succeeded to propose and demonstrate the work of three main algorithms for three scenarios couple with Kolmogorov-Smirnov, Euclidean distance, Bayesian Network and likelihood ratio test respectively to identify and to detect the difference between some standard tested text documents.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: pola dokumen teks, munculnya term, Kolmogorov-Smirnov, jarak Euclidean, Bayesian Network, likelihood ratio test, pattern of text documents, terms of appearance, Euclidean distance
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics > QA278.5 Principal components analysis. Factor analysis.
Divisions: Faculty of Mathematics and Science > Statistics > (S3) PhD Theses
Depositing User: Soehardjoe Soehardjoepri
Date Deposited: 08 Nov 2017 07:50
Last Modified: 05 Mar 2019 07:28
URI: http://repository.its.ac.id/id/eprint/43139

Actions (login required)

View Item View Item