Anggraini, Syadza (2022) Pengukuran Kemiripan Berbasis Leksikal dan Semantik untuk Perangkingan Dokumen Berbahasa Arab. Masters thesis, Institut Teknologi Sepuluh Nopember.
Text
05111850010031-Master_Thesis.pdf - Accepted Version Download (2MB) |
Abstract
Perangkingan dokumen merupakan salah satu topik dalam sistem temu kembali informasi. Dalam menghasilkan dokumen yang relevan, pengukuran kemiripan antara query dan dokumen menjadi faktor penting terhadap dokumen yang dirangking. Pengukuran kemiripan dapat dihitung berdasarkan bobot kata antara query dan dokumen. Namun, pengukuran kemiripan menggunakan bobot kata dimungkinkan adanya lafal kata yang berbeda tetapi memiliki makna kata yang sama. Selain itu, hasil dokumen pencarian suatu teks berbahasa Arab dipengaruhi oleh beragamnya penguasaan pengguna dalam memahami bahasa Arab. Oleh sebab itu, penelitian ini mengembangkan pengukuran kemiripan secara leksikal untuk mengatasi lafal kata dan pengukuran kemiripan secara semantik untuk mengatasi makna kata. Penggabungan perhitungan kemiripan leksikal dan semantik dihitung berdasarkan bobot kata (leksikal) dan digabungkan dengan word embedding (semantik). Berdasarkan hasil uji coba pada 2900 kitab berbahasa Arab, metode usulan memiliki rata-rata recall, precision, dan f-measure tertinggi daripada metode lainnya sebesar 72.42%, 65.83%, 64.2% pada all query, kemudian 73.2%, 63.15%, 63.1% pada short query, serta 71.31%, 69.86%, 65.7% pada long query. Short query adalah query dengan frekuensi sebanyak 1-2 kata sedangkan long query adalah query dengan frekuensi lebih dari 2 kata.
==================================================================================================================================
Document ranking is one of the topics in the information retrieval. In producing relevant documents, measuring the similarity between the query and the document becomes an important factor for the ranked documents. The similarity measurement can be calculated based on the term weight between the query and the document. However, the calculation of similarity based on the term weights has the possibility of differences in the calculation of weights on terms that are written differently but have the same word meaning. In addition, the results of searching documents for an Arabic text are influenced by the variety of user mastery in understanding Arabic. Therefore, a lexical similarity measurement is developed to overcome word pronunciation and a semantic similarity measurement is developed to overcome word meaning. The combination of lexical and semantic similarity calculations is calculated based on term weights (lexical) and combined with word embedding (semantic). Based on the results of evaluation on 2900 Arabic books, this research method has the highest average recall, precision, and f-measure compared to other methods of 72.42%, 65.83%, 64.2% for all queries, then 73.2%, 63.15%, 63.1% for short queries, and also 71.31%, 69.86%, 65.7% on long queries. A short query is a query with a word frequency of 1-2 words, while a long query is a query with frequency of more than 2 words.
Item Type: | Thesis (Masters) |
---|---|
Additional Information: | RTIf 025.524 Ang p-1 2022 |
Uncontrolled Keywords: | lexical similarity, semantic similarity, document ranking, similarity measurement, kemiripan leksikal, kemiripan semantik, perangkingan dokumen, pengukuran kemiripan |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5105.88815 Semantic Web Z Bibliography. Library Science. Information Resources > ZA Information resources > Z699.5 Information storage and retrieval systems |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
Depositing User: | - Davi Wah |
Date Deposited: | 02 Oct 2024 04:21 |
Last Modified: | 02 Oct 2024 04:21 |
URI: | http://repository.its.ac.id/id/eprint/115712 |
Actions (login required)
View Item |