Deteksi Aksara Jawa Menggunakan YOLO untuk Transliterasi Berbasis LSTM pada Manuskrip Jawa Kuno

Faizin, Muhammad Arif (2023) Deteksi Aksara Jawa Menggunakan YOLO untuk Transliterasi Berbasis LSTM pada Manuskrip Jawa Kuno. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 05111940000060-Undergraduate_Thesis.pdf] Text
05111940000060-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2025.

Download (6MB) | Request a copy

Abstract

Keberadaan manuskrip di Indonesia harus dijaga dan dipertahankan, karena manuskrip mampu mengungkap pola pikir masyarakat pada masa itu. Manuskrip tersebar di seluruh dunia, namun dari 121.668 judul manuskrip, hanya sekitar 19 ribu yang sudah didigitalkan. Sebagian besar manuskrip dalam kondisi rusak dan membutuhkan ahli untuk memprosesnya, dengan memakan waktu yang lama dan biaya yang besar. Sementara itu, beberapa penelitian tentang pengolahan citra digital sedang berkembang. Penelitian tentang metode deteksi YOLO pada aksara Jawa menunjukkan performa model yang bagus dan cepat. Sementara itu, penelitian tentang transliterasi aksara Jawa masih minim, metode LSTM dinilai cukup relevan oleh peneliti untuk meningkatkan proses transliterasi aksara Jawa. Pada penelitian ini, dataset yang dibangun adalah dataset manuskrip aksara Jawa berdasarkan manuskrip Serat Sewaka sebanyak 60 citra beserta anotasinya yang dinamakan Handwritten Javanese Character on Sewaka Manuscripts Detection (HJCS_DETC). Sedangkan untuk data latih, dataset perlu diproses menggunakan augmentasi data dan split yang dinamakan dengan HJCS_DETC_SPLIT. Data latih kemudian dilatih menggunakan model YOLOv5 dan menghasilkan bounding box untuk proses transliterasi. Proses transliterasi dilakukan dengan mendeteksi midline, mengurutkan karakter, pengenalan suku kata, memecahkan aksara pasangan, serta penggabungan dengan dataset terjemah transliterasi. Kemudian data diproses menggunakan model LSTM sebagai model transliterasi. Hasil evaluasi terhadap model deteksi menunjukkan F1 score maksimal dengan nilai 83,2% dan nilai mAP 81,4% pada skenario 9 yang menggunakan dataset augmentasi dan citra split 5 terhadap 67 kelas aksara Jawa. Hasil evaluasi terhadap model transliterasi mendapatkan model dengan nilai CER 16,7% dan WER 23,3% pada model BiLSTM menggunakan optimizer Adam dan learning rate 0,001.
=================================================================================================================================
The existence of manuscripts in Indonesia must be maintained because manuscripts are able to reveal the mindset of the people at that time. Of the 121,668 manuscript titles spread across the world, only about 19,000 have been digitized. Most manuscripts are in damaged condition and require experts to process them within a long time and large costs. Meanwhile, some research on digital image processing is developed. Research on the YOLO detection method in Javanese script shows good and fast model performance. Meanwhile, research on the transliteration of Javanese script is still minimal. LSTM method is considered quite relevant by researchers to improve the process of Javanese script transliteration. The dataset built in this study is a Javanese script dataset based on 60 images and annotations of the Serat Sewaka manuscript called Handwritten Javanese Character on Sewaka Manuscripts Detection (HJCS_DETC). As for training data, the dataset needed to be processed using data augmentation and splitting called HJCS_DETC_SPLIT. The training data was trained using YOLOv5 and generated bounding box for the transliteration process. The transliteration process was carried out by detecting midlines, sequencing characters, recognizing syllables, breaking pasangan characters, and combining with transliterated translation datasets. Next, the data was processed using the LSTM as the transliteration model. The evaluation results of the detection model showed a maximum F1 score of 83.2% and 81.4% mAP values in scenario 9 using augmentation datasets and 5 split images for 67 Javanese script classes. The evaluation results of the transliteration model showed with a CER of 16.7% and WER of 23.3% in the BiLSTM model with optimizer Adam and learning rate 0.001.

Item Type: Thesis (Other)
Uncontrolled Keywords: Aksara Jawa, Deteksi Karakter, LSTM, Transliterasi, YOLO, Javanese Letter, Character Detection, Transliteration
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning.
Q Science > QA Mathematics > QA336 Artificial Intelligence
Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science)
T Technology > T Technology (General) > T57.5 Data Processing
T Technology > TA Engineering (General). Civil engineering (General) > TA1637 Image processing--Digital techniques
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Muhammad Arif Faizin
Date Deposited: 19 Oct 2023 06:55
Last Modified: 19 Oct 2023 06:55
URI: http://repository.its.ac.id/id/eprint/102757

Actions (login required)

View Item View Item