Pengembangan Sistem Tanya-Jawab Peraturan Teknologi Informasi di Indonesia Menggunakan Graph Retrieval Augmented Generation (Graph-RAG)

Mukti, Bayu Siddhi (2025) Pengembangan Sistem Tanya-Jawab Peraturan Teknologi Informasi di Indonesia Menggunakan Graph Retrieval Augmented Generation (Graph-RAG). Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5026211021-Undergraduate_Thesis.pdf] Text
5026211021-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (4MB) | Request a copy

Abstract

Indonesia, sebagai negara hukum, menghadapi tantangan dalam memastikan akses informasi hukum yang merata bagi seluruh masyarakat. Meskipun pemerintah telah menyediakan layanan Jaringan Dokumentasi dan Informasi Hukum (JDIH), aksesibilitas masyarakat masih terhambat oleh kompleksitas terminologi hukum dan keterbatasan sistem pencarian. Penelitian ini mengusulkan pengembangan sistem tanya-jawab peraturan Indonesia berbasis Graph Retrieval Augmented Generation (Graph-RAG), yang menggabungkan kemampuan Large Language Model (LLM) dengan pencarian data berbasis graf pengetahuan Neo4j. Sistem ini dirancang untuk mempermudah akses informasi hukum, khususnya terhadap 63 peraturan terkait teknologi informasi, bagi pengguna awam dengan menggunakan bahasa sehari-hari dan diharapkan dapat berkontribusi dalam meningkatkan literasi hukum masyarakat Indonesia. Graph-RAG mengombinasikan pengambilan data terstruktur melalui text2cypher dan data tidak terstruktur melalui vector-cypher berdasarkan kerangka kerja LangChain dan LangGraph, sehingga diharapkan dapat meningkatkan akurasi jawaban dan mengurangi risiko halusinasi LLM. Pengujian dilakukan untuk menemukan konfigurasi komponen terbaik melalui penggunaan dua LLM dengan skala berbeda (Llama 3.1 8B Instruct dan Claude 3.5 Haiku), dua strategi prompting (zero-shot dan few-shot), tiga model text embedding (small, large, dan domain-specific), serta variasi nilai k sebagai jumlah node awal sebelum perluasan. Evaluasi menggunakan kerangka kerja RAGAS menunjukkan bahwa Graph-RAG dengan konfigurasi terbaik pada setiap komponennya (Claude 3.5 Haiku, few-shot prompting, model text embedding bertipe large, dan k=7), mampu mencapai nilai ROUGE-L sebesar 0,3471, relevansi jawaban sebesar 0,8491 dan akurasi jawaban sebesar 0,7525, lebih tinggi dibandingkan nilai yang diperoleh oleh Claude 3.5 Haiku non-RAG, yakni 0,2060, 0,7068, dan 0,3275. Temuan ini menunjukkan bahwa pendekatan Graph-RAG dalam domain hukum secara signifikan meningkatkan ketepatan penalaran serta memastikan jawaban berpijak pada data yang valid.
===================================================================================================================================
Indonesia, as a legal state, faces challenges in ensuring equal access to legal information for all citizens. Despite government efforts through the establishment of the Legal Documentation and Information Network (JDIH), public accessibility remains hindered by the complexity of legal terminology and the limitations of existing search systems. This research proposes the development of an Indonesian regulation question-answering system based on Graph Retrieval Augmented Generation (Graph-RAG), which combines the capabilities of Large Language Models (LLMs) with Neo4j knowledge graph-based data retrieval. The system is designed to simplify access to legal information, particularly for 63 regulations in the field of information technology, for non-expert users by enabling searches using everyday language, thereby contributing to improving legal literacy among Indonesian society. By integrating structured data retrieval through text2cypher and unstructured data retrieval through vector-cypher, based on the LangChain and LangGraph frameworks, Graph-RAG aims to improve answer accuracy and reduce the risk of LLM hallucinations. Testing was conducted to determine the optimal component configurations, involving two LLMs of different scales (Llama 3.1 8B Instruct and Claude 3.5 Haiku), two prompting strategies (zero-shot and few-shot), three types of text embedding models (small, large, and domain-specific), and various k values representing the number of initial nodes before expansion. Evaluation using the RAGAS framework shows that Graph-RAG, with the optimal configuration for each component (Claude 3.5 Haiku, few-shot prompting, large text embedding model, and k=7), achieved a ROUGE-L score of 0.3471, answer relevancy of 0.8491, and answer accuracy of 0.7525, outperforming the non-RAG Claude model, which achieved scores of 0.2060, 0.7068, and 0.3275, respectively. These findings indicate that the Graph-RAG approach in the legal domain significantly enhances reasoning accuracy and ensures that the answers are grounded in valid data.

Item Type: Thesis (Other)
Uncontrolled Keywords: Informasi Hukum, Graph-RAG, Sistem Tanya-Jawab, Model Bahasa Besar, Graf Pengetahuan, Legal Information, Graph-RAG, Question Answering System, Large Language Model, Knowledge Graph
Subjects: K Law > K Law (General)
Q Science > QA Mathematics > QA166 Graph theory
Q Science > QA Mathematics > QA336 Artificial Intelligence
T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information System > 57201-(S1) Undergraduate Thesis
Depositing User: Bayu Siddhi Mukti
Date Deposited: 23 Jul 2025 06:41
Last Modified: 23 Jul 2025 06:41
URI: http://repository.its.ac.id/id/eprint/120894

Actions (login required)

View Item View Item