Virtual Screening Berbasis Machine Learning Untuk Penemuan Kandidat Antivirus H9N2

Amiroch, Siti (2023) Virtual Screening Berbasis Machine Learning Untuk Penemuan Kandidat Antivirus H9N2. Doctoral thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 06111960010003-Dissertation.pdf] Text
06111960010003-Dissertation.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2025.

Download (7MB) | Request a copy

Abstract

Virtual screening merupakan alat yang sangat penting terutama pada fase awal proses penemuan obat. Virtual screening dilakukan dengan pencarian in silico yang efisien atas jutaan senyawa. Salahsatu metode berbasis senyawa aktif (ligan) yang digunakan dalam virtual screening adalah machine learning. Beberapa metode machine learning telah diterapkan untuk virtual screening pada objek yang berbeda-beda dengan tingkat akurasi yang bervariasi. Tujuan dari disertasi ini adalah mengkonstruksi metode Log-RBF, suatu metode klasifikasi alternatif yang merupakan modifikasi logistic regression dengan radial basis function. Selain itu, disertasi ini menerapkan tujuh metode machine learning seperti logistic regression, k-nearest neighbors, support vector machine, multilayer perceptron, random forest, gradient boosting serta XGBoost yang diuji cobakan pada data avian influenza A/H9N2. Penerapan tujuh metode tersebut untuk menentukan metode terbaik yang digunakan dalam memprediksi kandidat senyawa antivirus avian A/H9N2. Sebagai pembanding, juga dilakukan analisis dalam molecular docking serta simulasi molecular dynamic untuk mendapatkan hasil yang lebih valid. Parameter yang digunakan untuk mengukur kualitas model machine learning meliputi accuracy, sensitivity, specificity, balanced accuracy, dan ROC score.
Virus avian influenza A/H9N2 tidak hanya menginfeksi ayam pada saluran pernafasan dan bereplikasi di saluran reproduksi, tetapi juga menyerang manusia. Infeksi pada saluran reproduksi ini menyebabkan kerusakan telur sehingga menurunkan produksi telor hingga 80%. Penelitian ini menjadi penting karena sampai saat ini belum ditemukan antivirus yang dapat menghambat perkembangan virus avian influenza A/H9N2. Dari hasil percobaan dengan menggunakan data senyawa aktif sintetis (data pertama), diperoleh hasil bahwa metode XGBoost secara signifikan lebih baik daripada metode lain dengan hasil skor tes lebih tinggi, mencakup nilai accuracy 0,9686, sensitivity 0,96, specificity 0,9711, balanced accuracy 0,9656, dan skor AUC/ROC 0,9656. Dari hasil virtual screening dengan model XG-Boost menggunakan data pertama, teridentifikasi skor senyawa aktif yang probabilitasnya tinggi adalah benzoic acid inhibitor 6, dengan nilai prediksi 0,9992316. Hasil analisis dengan molecular docking, ligan paling berpotensi adalah laninamivir octanoate. Namun setelah dilakukan simulasi molecular dynamic, ligan hasil prediksi machine learning (XGBoost) terutama benzoic acid inhibitor 6 memiliki potensi inhibisi yang lebih baik terhadap protein target neuraminidase H9N2 dibandingkan dengan ligan prediksi molecular docking. Artinya, benzoic acid inhibitor 6 sebagai kandidat senyawa hasil prediksi machine learning lebih berpotensi sebagai kandidat antivirus avian influenza A/H9N2. Metode Log-RBF, berdasarkan hasil percobaan dengan data kedua, teridentifikasi accuracy dan specificity dibawah XGBoost. Namun ditinjau dari sensitivity, ROC score maupun BACC, Log-RBF mengungguli XGBoost. Metode Log-RBF mengungguli metode logistic regression biasa maupun RBF-Multikuadratik dengan accuracy 0,9518, sensitivity 0,8696, specificity 0,9725, balanced accuracy 0,9210, dan skor AUC/ROC 0,9210. Dari hasil prediksi metode Log-RBF pada data kedua, diperoleh 124 senyawa dengan threshold 0.992. Dari pemeringkatan 124 senyawa dimaksud, meski perbedaan peringkat antar senyawa relatif jauh, selisih score prediksi dari metode XGBoost dan metode Log-RBF pada masing-masing senyawa sangat kecil. Dari hasil rata-rata perhitungan skor masing-masing senyawa, diperoleh perbedaan hasil prediksi antara XGBoost dan Log-RBF sekitar 5.2 % atau kesamaan hasil prediksi kedua metode sekitar 94,7 %. Hal ini mengindikasikan bahwa metode Log-RBF sebagai metode klasifikasi alternatif layak untuk dipertimbangkan.
===============================================================================================================================
Virtual screening is a very important tool, especially in the early phases, for the drug discovery process. Virtual screening is carried out using an efficient In Silico search of millions of compounds. One of the active compound (ligand) based methods used in virtual screening is machine learning. Several machine learning methods have been applied for virtual screening on different objects with varying degrees of accuracy. The purpose of this dissertation is to construct the Log-RBF method, an alternative classification method which is a modified logistic regression with a radial basis function. In addition, this dissertation applies seven machine learning methods such as logistic regression, k-nearest neighbors, support vector machine, multilayer perceptron, random forest, gradient boosting and XGBoost which were tested on avian influenza A/H9N2 data. These seven methods were apllied to determine the best method used in predicting an avian A/H9N2 anti virus compound candidate. As a comparison, an analysis of molecular docking and validation with molecular dynamic simulations were also performed to obtain a more valid result. Parameters used to measure the quality of machine learning models include accuracy, sensitivity, specificity, balanced accuracy, and ROC score. Not only does Avian influenza A/H9N2 virus infect chickens in the respiratory tract and replicates in the reproductive tract, it also attacks humans. This infection in the reproductive tract causes damage to eggs resulting in the reduction of egg production by up to 80%. This research is important because so far no antiviral has been found that can inhibit the development of the avian influenza A/H9N2 virus. The results of the experiments using synthetic active compound data (first data) showed that the XGBoost method was significantly better, having higher test scores than other methods, including accuracy values of 0.9686, sensitivity of 0.96, specificity of 0.9711, balanced accuracy of 0.9656, and AUC/ROC score of 0.9656. The results of virtual screening with the XG-Boost model using the first data revealed that the score of the active compound having a high probability was benzoic acid inhibitor 6, with a predictive value of 0.9992316. The results of the analysis using molecular docking displayed that the most potent ligand was laninamivir octanoate. However, after validation with molecular dynamic simulations, the predicted ligands from machine learning (XGBoost), especially benzoic acid 6, have a better inhibition potential of the H9N2 neuraminidase target protein compared to the predicted molecular docking ligands. This means that the compound candidate predicted by machine learning has more potential to inhibit the virus or as an antiviral candidate for avian influenza A/H9N2. Even though the Log-RBF method, based on the experimental results with the second data, indicated that the values of accuracy and specificity were under XGBoost, in terms of sensitivity, the Log-RBF method outperformed XGBoost with higher ROC and BACC values. The Log-RBF method outperformed the ordinary logistic regression method and the RBF-Multiquadric with accuracy of 0.9518, sensitivity of 0.8696, specificity of 0.9725, balanced accuracy of 0.9210, and AUC/ROC of 0.9210. The prediction results of the Log-RBF method in the second data resulted in 124 compounds with a threshold of 0.992. In regard to the ranking of the 124 compounds, although the difference in ranking between compounds is relatively large, the difference in predicted scores between the XGBoost method and the Log-RBF method for each compound is very small. The results of the average score calculation of each compound demonstrated that the difference in the prediction results between XGBoost and Log-RBF was around 5.2% or the similarity of the prediction results of the two methods was around 94.7%. This indicates that the Log-RBF method is worth considering as an alternative classification method.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: Antivirus, H9N2, Machine Learning, Virtual Screening
Subjects: Q Science
Q Science > Q Science (General) > Q325.5 Machine learning.
Q Science > QA Mathematics
Q Science > QA Mathematics > QA336 Artificial Intelligence
Divisions: Faculty of Science and Data Analytics (SCIENTICS) > Mathematics
Depositing User: Siti Amiroch
Date Deposited: 01 Mar 2023 08:47
Last Modified: 01 Mar 2023 08:47
URI: http://repository.its.ac.id/id/eprint/97716

Actions (login required)

View Item View Item