Boosting Support Vector Machine pada Data Microarray yang Imbalance

Pratama, Risky Frasetio Wahyu (2018) Boosting Support Vector Machine pada Data Microarray yang Imbalance. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 06211650010002-Master_Thesis.pdf]
Preview
Text
06211650010002-Master_Thesis.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Data microarray memainkan peran penting dalam pengklasifikasian hampir semua jenis jaringan kanker. Permasalahan yang seringkali dihadapi dalam klasifikasi menggunakan data microarray adalah high dimensional data dan kelas imbalance. Masalah high dimensional data dapat diatasi dengan menggunakan seleksi fitur Fast Correlated Based Filter. Metode klasifikasi yang digunakan dalam penelitian ini yaitu Support Vector Machines (SVM) karena beberapa kelebihannya, namun SVM sangat sensitif terhadap kelas imbalance. SMOTE merupakan salah satu dalam penanganan data imbalance dengan cara mereplikasi pengamatan pada kelas minoritas. Metode ini seringkali bekerja baik namun terkadang juga terjadi masalah overfitting. Salah satu alternatif lain dalam meningkatkan performansi klasifikasi pada data imbalance yaitu boosting. Metode ini membangun suatu classifier akhir yang kuat dengan menggabungkan sekumpulan SVM sebagai base classifier selama proses iterasi, sehingga dapat meningkatkan performansi klasifikasi. Penelitian ini, bertujuan untuk mengkaji performansi dari SMOTEBoost-SVM jika dibandingkan dengan AdaBoost-SVM dalam melakukan klasifikasi pada data microarray dengan beberapa tingkatan rasio imbalance yang didesain dalam studi simulasi dan penerapan pada data publik microarray. Data publik yang digunakan yaitu data kanker colon dan data myeloma. Hasil analisis yang diperoleh yaitu secara umum, pada studi simulasi, semua classifier mengalami penurunan performansi g-mean seiring bertambahnya rasio kelas imbalance, namun SMOTEBoost-SVM cenderung unggul dan mengalami penurunan performansi lebih kecil (lebih stabil) dibandingkan AdaBoost-SVM, SMOTE-SVM dan SVM. Pada Penerapan data publik, SMOTEBoost SVM juga mengungguli ketiga metode lain berdasarkan ukuran g-mean dan sensitivity. Efek dari seleksi fitur juga dilihat dalam analisis dimana menggunakan fitur-fitur informatif hasil seleksi fitur, menghasilkan performansi yang lebih baik dibandingkan menggunakan seluruh fitur dalam klasifikasi.
========================================================================================================
Microarray data plays an important role in the classification of almost all types of cancer tissue. The problems that often appear in the classification using microarray data are high-dimensional data and imbalanced class. The problem of high-dimensional data can be solved by using Fast Correlated Based Filter (FCBF) feature selection. In this paper, Support Vector Machine (SVM) classifier is used because of its advantages. However, SVM are sensitive with respect to imbalanced class. SMOTE is one of the prepocessing data methods in handling imbalanced class based on sampling approach by increasing the number of samples from the minority class. This method often works well but sometimes it might suffer from over-fitting problem. One other alternative approach in improving the performance of imbalanced data classification is boosting. This method constructs a powerful final classifier by combining a set of SVMs as base classifier during the iteration process. So, it can improve the classification performance. This study aims to see the performance of SMOTEBoost-SVM compared with AdaBoost-SVM in classifying microarray data with several levels of imbalance ratio designed in the simulation study and to apply classification process on public microarray datasets. Colon cancer and myeloma data are used in this study. The result showed that in the simulation study, all classifiers get the g-mean performance deacreasing as the ratio of the imbalanced class is increased, but SMOTEBoost-SVM tend to be superior. Its performance is decrease smaller (more stable) than AdaBoost-SVM, SMOTE-SVM and SVM. In the real data classification, SMOTEBoost-SVM outperforms the others with respect to g-mean and sensitivity metrics.The effect of feature selection is also checked in the analysis. Using informative features obtained in feature selection process gave the better performance than using all feature in the classification process by SVM.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Microarray Data, Class Imbalance, FCBF, SMOTE, Support Vector Machine, AdaBoost, SMOTEBoost, Data Microarray, Kelas Imbalance, FCBF, SMOTE, Support Vector Machine, AdaBoost, SMOTEBoost.
Subjects: Q Science > QA Mathematics > QA76.9.D343 Data mining. Querying (Computer science)
Q Science > QR Microbiology > QR 201.T84 Tumors. Cancer
T Technology > T Technology (General) > T57.8 Nonlinear programming. Support vector machine. Wavelets. Hidden Markov models.
Divisions: Faculty of Mathematics, Computation, and Data Science > Statistics > 49101-(S2) Master Thesis
Depositing User: Risky Frasetio Wahyu Pratama
Date Deposited: 16 Oct 2020 03:49
Last Modified: 16 Oct 2020 03:49
URI: http://repository.its.ac.id/id/eprint/55389

Actions (login required)

View Item View Item