Deteksi Real-Time Common Vulnerabilities and Exposures dengan Lambda Architecture dan Vector Database

Sastrowardoyo, Rangga Aldo (2025) Deteksi Real-Time Common Vulnerabilities and Exposures dengan Lambda Architecture dan Vector Database. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5027211059-Undergraduate_Thesis.pdf] Text
5027211059-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (4MB) | Request a copy

Abstract

Vulnerabilitas merupakan cacat atau kelemahan pada perangkat lunak yang dapat dimanfaatkan oleh pihak tidak bertanggung jawab untuk menyerang sistem. Di tengah meningkatnya kompleksitas dan dinamika ancaman siber, pembaruan informasi mengenai Common Vulnerabilities and Exposures (CVE) sering kali tertinggal sehingga aplikasi chatbot konvensional tidak mampu menyajikan informasi terkini secara akurat. Penelitian ini mengembangkan chatbot untuk mendeteksi dan memberikan informasi tentang CVE dengan pendekatan Retrieval Augmented Generation (RAG) agar konteks keamanan siber diperbarui secara real-time. Hal ini bertujuan untuk mengatasi keterbatasan Large Language Model (LLM) konvensional, khususnya dalam menyediakan informasi yang terkini dan relevan mengenai CVE yang diakibatkan oleh knowledge cutoff pada proses pelatihan LLM. Data dikumpulkan dari National Vulnerability Database (NVD) dan berbagai sumber lainnya menggunakan metode Change Data Capture (CDC) xmin pada PostgreSQL untuk melacak setiap perubahan atau pembaruan pada data. Teks diubah menjadi vector embedding menggunakan API OpenAI (text-embedding-ada-002), yang disimpan ke dalam vector database. Hasil evaluasi RAGAS menunjukkan bahwa aplikasi RAG penulis secara signifikan mengungguli LLM seperti GPT-4o, Claude 3.5, dan Gemini Flash 2.0, yang dapat diukur oleh metrik answer correctness, mencapai skor 0,79 dibandingkan dengan 0,34 pada LLM tanpa retrieval. Perbedaan kinerja ini disebabkan oleh kemampuan aplikasi RAG dalam menarik informasi terbaru secara real-time melalui vector database yang terintegrasi. Evaluasi oleh para ahli keamanan siber menghasilkan tingkat kepuasan tinggi sebesar 88,33% (dengan rata-rata skor 3,53 dari skala 4). Para praktisi cybersecurity sangat mengapresiasi kemampuan chatbot penulis dalam menyediakan informasi CVE terkini, membantu pembuatan payload untuk pengujian keamanan, serta fleksibilitasnya dalam mendukung berbagai aktivitas penting seperti riset keamanan dan kompetisi Capture the Flag (CTF).
========================================================================================================================================
A vulnerability refers to a flaw or weakness in software that can be exploited by malicious actors to compromise systems. With cybersecurity threats constantly evolving and becoming more complex, getting up-to-date CVE information quickly is a challenge, often leaving traditional chatbot applications unable to provide accurate, current details. This research introduces a new chatbot designed to identify and share information about Common Vulnerabilities and Exposure (CVE) by using a Retrieval Augmented Generation (RAG) approach. This method ensures that cybersecurity context is updated in real-time, directly addressing the limitations of conventional Large Language Models (LLMs) that struggle with current and relevant CVE information due to their training data's knowledge cutoff. We gather data from the National Vulnerability Database (NVD) and various other sources, employing the Change Data Capture (CDC) xmin method on PostgreSQL to meticulously track all data changes or updates. This text data is then transformed into vector embeddings using OpenAI's API (text-embedding-ada-002) and stored in a vector database. Our RAGAS evaluation shows that our RAG application significantly outperforms baseline LLMs like GPT-4o, Claude 3.5, and Gemini Flash 2.0 in answer correctness, achieving a score of 0.79 compared to just 0.34 for LLMs without retrieval. This superior performance comes from the RAG application's ability to fetch the latest information in real-time through its integrated vector database. Furthermore, an expert evaluation by cybersecurity professionals resulted in a high satisfaction rate of 88.33% (with an average score of 3.53 out of 4). These practitioners particularly praised the chatbot's ability to deliver the most recent CVE information, help create payloads for security testing, and its flexibility in supporting crucial activities such as security research and Capture the Flag (CTF) challenges.

Item Type: Thesis (Other)
Uncontrolled Keywords: CVE, Lambda Architecture, Retrieval Augmented Generation, Vector Database, RAGAS, CVE, Lambda Architecture, Retrieval Augmented Generation, Vector Database, RAGAS
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information Technology > 59201-(S1) Undergraduate Thesis
Depositing User: Rangga Aldo Sastrowardoyo
Date Deposited: 31 Jul 2025 07:16
Last Modified: 31 Jul 2025 07:16
URI: http://repository.its.ac.id/id/eprint/124149

Actions (login required)

View Item View Item