Entropy Based Fuzzy Support Vector Machine (EFSVM) untuk Klasifikasi Microarray Imbalanced Data

Ladayya, Faroh (2018) Entropy Based Fuzzy Support Vector Machine (EFSVM) untuk Klasifikasi Microarray Imbalanced Data. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 1315201202-Master_Thesis.pdf]
Preview
Text
1315201202-Master_Thesis.pdf - Accepted Version

Download (8MB) | Preview

Abstract

DNA microarray merupakan data yang mengandung ekspresi gen dengan ukuran sampel kecil, namun memiliki jumlah feature yang sangat besar. Selain itu masalah kelas imbalanced merupakan masalah umum dalam data microarray. Oleh karena itu diperlukan metode klasifikasi yang mampu mengatasi pemasalahan high dimensional dan juga permasalahan imbalanced. SVM merupakan salah satu metode klasifikasi yang mampu menangani sampel besar atau kecil, non-linear, high dimensional, over learning dan masalah lokal minimum. Metode SVM juga telah banyak diterapkan untuk klasifikasi data DNA microarray dan didapatkan hasil bahwa SVM memberikan kinerja terbaik di antara metode machine learning lainnya. Namun pengaruh dari imbalanced data pada SVM akan menjadi kekurangan dikarenakan SVM memperlakukan semua sampel dengan kepentingan yang sama sehingga mengakibatkan bias terhadap kelas minoritas. Salah satu metode yang mampu mengatasi imbalanced data adalah EFSVM. EFSVM mampu menghasilkan nilai AUC yang tertinggi apabila dibandingkan dengan SVM dan FSVM. Mengingat data DNA microarray merupakan high dimensional data dengan jumlah feature yang sangat besar, maka perlu dilakukan feature selection terlebih dahulu. Pada penelitian dilakukan klasifikasi terhadap data DNA microarray dengan kasus data yang imbalanced menggunakan EFSVM dengan terlebih dahulu dilakukan seleksi fitur menggunakan FCBF. Hasil performansi klasifikasi menunjukkan bahwa feature selection mampu meningkatkan performansi klasifikasi. Adanya penambahan entropy based fuzzy membership terbukti mampu menghasilkan performansi paling tinggi dibandingkan dengan SVM dan FSVM, namun untuk data yang telah dilakukan feature selection, antara FSVM dan EFSVM diperoleh hasil yang hampir sama.
============================================================================DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a major class which have significantly more instances than the other minority classes in the data. Therefore, it is needed a classification method that can solve the problem of high dimensional and imbalanced data. SVM is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, EFSVM is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different constributions to the classifier. The samples with higher class certainty, that measured by entropy, are assigned to larger fuzzy membership. The importance of the minority classes have large fuzzy membership and EFSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is high dimensional data with a very large number of features, it is necessary to do feature selection first using FCBF. Based on the overall results, EFSVM has the highest AUC value compared to SVM and FSVM.

Item Type: Thesis (Masters)
Uncontrolled Keywords: DNA Microarray, EFSVM, High Dimensional, Imbalanced Data
Subjects: Q Science > Q Science (General)
Divisions: Faculty of Mathematics and Science > Statistics > 49101-(S2) Master Thesis
Depositing User: Faroh Ladayya
Date Deposited: 14 Apr 2020 07:52
Last Modified: 14 Apr 2020 07:52
URI: http://repository.its.ac.id/id/eprint/50643

Actions (login required)

View Item View Item