Analisis Kinerja CNN-RNN untuk Klasifikasi Sekuen DNA dari Virus Sistem Pernapasan

Sholihah, Imroatus (2023) Analisis Kinerja CNN-RNN untuk Klasifikasi Sekuen DNA dari Virus Sistem Pernapasan. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 06111940000075-Undergraduate_Thesis.pdf] Text
06111940000075-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2025.

Download (6MB) | Request a copy

Abstract

Berbagai jenis virus sistem pernapasan memerlukan penanganan khusus. Namun, menyebabkan gejala yang sama seperti demam dan flu, sehingga diperlukan pengujian khusus untuk menentukan virus apa yang menyerang. Pengujian dapat dilakukan dengan menggunakan DNA (Deoxyribonucleic Acid) karena unik untuk setiap organisme tetapi membutuhkan prosedur yang mahal. Pendekatan Machine Learning dapat membantu mengklasifikasikan jenis virus berdasarkan data DNA. Data sekuen DNA dikumpulkan dan diproses menggunakan teknik K-Mer encoding. Dalam penelitian ini, dievaluasi kombinasi model Convolutional Neural Network (CNN) dan Recurrent Neural Network (RNN) untuk mengklasifikasikan lima jenis virus (SARS, MERS, RSV, Influenza A, dan virus lainnya). Tiga jenis RNN yang digunakan: Simple RNN, Gated Recurrent Unit (GRU), dan Long Short Term Memory (LSTM), sehingga diujikan tiga model hybrid: CNN-Simple RNN, CNN-LSTM, dan CNN-GRU. Setelah menguji ketiga model dan menganalisis pengaruh nilai K, diperoleh bahwa ketiga model hybrid mengungguli model CNN atau RNN. Dari hasil tersebut, model CNN-LSTM dengan ukuran K=2 memiliki akurasi data uji terbaik mencapai 0,9991, namun model CNN-Simple RNN memiliki waktu komputasi tercepat. Akurasi yang tinggi menunjukkan bahwa sistem dapat dilatih ulang pada kumpulan data DNA berskala besar.
==================================================================================================================================
Various types of respiratory system viruses require special treatment. However, it causes the same symptoms as fever and flu, so special testing is needed to determine what virus is attacking. Testing can be done using DNA (Deoxyribonucleic Acid) because it is unique for each organism but requires expensive procedures. The Machine Learning (ML) approach can help classify virus types based on DNA data. We collected the data and processed it using K- Mer encoding technique. In this paper, we investigate the combination of Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNNs) to classify five types of viruses (SARS, MERS, RSV, Influenza A, and other viruses). Three types of RNNs are used: Simple RNN, Gated Recurrent Unit (GRU), and Long Short Term Memory (LSTM), in this case we proposed three hybrid models: CNN-Simple RNN, CNN-LSTM, and CNN-GRU. After we analyzed the effect of the K value on model performance, we found that the three hybrid models outperformed the CNN or RNN models. From the results, the CNN-LSTM model with size K=2 has the best test data accuracy reaching 0.9991, but the CNN-Simple RNN model has the fastest computation time. The high accuracy demonstrates that the system can be retrained on a large-scale DNA dataset.

Item Type: Thesis (Other)
Uncontrolled Keywords: CNN, Classification, RNN, DNA Sequence, Klasifikasi, Sekuen DNA
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning.
Q Science > QA Mathematics
Q Science > QA Mathematics > QA336 Artificial Intelligence
Divisions: Faculty of Science and Data Analytics (SCIENTICS) > Mathematics > 44201-(S1) Undergraduate Thesis
Depositing User: Imroatus Sholihah
Date Deposited: 29 Nov 2023 07:54
Last Modified: 29 Nov 2023 07:54
URI: http://repository.its.ac.id/id/eprint/101198

Actions (login required)

View Item View Item