Deteksi Interelasi Antar Kitab Hadis Menggunakan Word Embedding dan Ensemble Learning

Ariyanto, Amelia Devi Putri (2022) Deteksi Interelasi Antar Kitab Hadis Menggunakan Word Embedding dan Ensemble Learning. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 6025201008-Master_Thesis.pdf] Text
6025201008-Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2024.

Download (2MB) | Request a copy

Abstract

Salah satu dari sumber utama ajaran Islam adalah hadis. Seorang muslim mempelajari hadis berdasarkan kategori yang sesuai dengan kebutuhan dapat dilakukan dengan menggunakan kategorisasi dokumen teks. Kategorisasi dokumen teks juga dapat digunakan untuk mengatasi permasalahan interelasi antara Al-Qur’an dan hadis melalui penggunaan pembobotan kata TF-IDF dan metode machine learning saja, seperti yang dilakukan oleh penelitian sebelumnya. Namun, sejauh ini belum ada yang melakukan interelasi antar kitab hadis padahal juga penting untuk mengetahui keterkaitan antar kitab hadis karena pada beberapa hadis dalam kitab hadis tertentu memiliki kesamaan makna dengan kitab hadis lain. Pembobotan kata TF-IDF tersebut masih kurang semantik sehingga diperlukan suatu pendekatan lain yang mampu merepresentasikan makna kata berdasarkan distribusinya di dalam teks yaitu dengan menggunakan contextual word embedding AraBERT.
Penelitian ini mengusulkan metode untuk mendeteksi interelasi antar kitab hadis dengan menggunakan kategorisasi teks, yang mengelompokkan hadis-hadis yang saling berkaitan satu sama lain berdasarkan kesamaan makna ke dalam satu kategori yang sama. Kategorisasi teks yang diusulkan menggunakan word embedding AraBERT dan ensemble learning yang mengintegrasikan beberapa metode machine learning yaitu Naïve Bayes, KNN, SVM dan Decision Tree. Weighted majority voting juga digunakan untuk menggabungkan prediksi dari beberapa metode machine learning.
Hasil penelitian menunjukkan bahwa dengan mengkombinasikan word embedding AraBERT sebagai metode ekstraksi fitur dan ensemble learning sebagai metode kategorisasi teks mampu mengatasi interelasi antar kitab hadis. Interelasi antar kitab hadis telah teratasi karena metode usulan berhasil mengelompokkan hadis-hadis dari beberapa kitab hadis kedalam satu kategori yang sama berdasarkan kesamaan makna secara baik. Hal tersebut terbukti melalui perolehan hasil nilai f1-score dan akurasi yang paling tinggi diantara kombinasi metode ekstraksi fitur maupun metode kategorisasi teks lain. Perolehan nilai f1-score dan akurasi tersebut sebesar 80% dan 82%.
================================================================================================
One of the main sources of Islamic teachings is the hadith. For a Muslim to study hadith based on categories that suit they need can be done by using text categorization. Text categorization can also be used to overcome the problem of the interrelation between the Qur'an and hadith through the use of TF-IDF word weighting and machine learning algorithms only, as has been done by previous research. However, so far there has been no research that has conducted interrelationships between books of hadith even though it is also important to know the relationship between books of hadith because some hadith in certain hadith book have the same meaning with another hadith books. The use of TF-IDF word weighting is still lacking in semantics, so it needs another approach that is able to represent the meaning of words based on their distribution in the text, namely by using contextual word embedding AraBERT.
This study proposes to detect interrelationships between books of hadith by using text categorization, which groups hadiths related to each other based on the similarity of meaning into the same category. The proposed text categorization uses word embedding AraBERT and ensemble learning to integrate several machine-learning methods, namely Naïve Bayes, KNN, SVM, and Decision Tree. Weighted majority voting is also used to combine predictions from several machine learning methods.
The results showed that by combining word embedding AraBERT as a feature extraction method and ensemble learning as a text categorization method, it overcame the interrelation between the books of hadith. The interrelation between the books of hadith has been resolved because the proposed method has succeeded in grouping the hadiths from several hadith books into the same category based on the similarity of meaning. The f1-score and accuracy are the highest among other text categorization and feature extraction methods combinations. The obtained f1-score and accuracy are 80% and 82%.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Ensemble Learning, Hadis, Interelasi, Kategorisasi Teks, Word Embedding, Ensemble Learning, Hadith, Interrelation, Text Categorization, Word Embedding
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning.
T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55101-(S2) Master Thesis
Depositing User: Amelia Devi Putri Ariyanto
Date Deposited: 04 Feb 2022 08:47
Last Modified: 31 Oct 2022 02:04
URI: http://repository.its.ac.id/id/eprint/92825

Actions (login required)

View Item View Item