Seleksi Fitur pada Data Gabungan Teks dan Bukan Teks Multi-Label Menggunakan Proportional Feature Rough Selector yang Mempertimbangkan Korelasi Antar Fitur

Paryoko, Vilat Sasax Mandala Putra (2021) Seleksi Fitur pada Data Gabungan Teks dan Bukan Teks Multi-Label Menggunakan Proportional Feature Rough Selector yang Mempertimbangkan Korelasi Antar Fitur. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 05211950010026-Master_Thesis.pdf] Text
05211950010026-Master_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2023.

Download (3MB) | Request a copy

Abstract

Proportional Feature Rough Selector (PFRS) merupakan sebuah metode seleksi fitur yang dikembangkan berdasarkan Rough Set Theory (RST). Pengembangan ini dilakukan dengan merinci pembagian wilayah dalam set data menjadi beberapa bagian penting yaitu lower approximation, upper approximation dan boundary region. PFRS memanfaatkan boundary region untuk menemukan wilayah yang lebih kecil yaitu Member Section (MS) dan NonMember Section (NMS). Namun PFRS masih hanya digunakan dalam seleksi fitur pada klasifikasi biner dengan tipe data teks. PFRS ini juga dikembangkan tanpa memperhatikan hubungan antar fitur, sehingga PFRS memiliki potensi untuk ditingkatkan dengan mempertimbangkan korelasi antar fitur dalam set data. Penelitian ini bertujuan untuk melakukan modifikasi PFRS agar dapat diterapkan pada klasifikasi multi-label dengan data campuran yang berisikan data teks dan data bukan teks. Modifikasi dilakukan untuk memperoleh lower approximation, upper approximation, dan penentuan nilai kesenjangan paling optimal. Selain itu, dalam penelitian ini, korelasi antar fitur ikut dipertimbangkan dalam proses seleksi fitur untuk meningkatkan hasil akurasi. Hasil modifikasi dari PFRS diuji coba menggunakan empat set data publik yang masing-masing mempunyai label kelas jamak, yaitu Reuters News Dataset, 515k Hotel Reviews, Netflix TV Shows dan Ted Talks. Empat metode klasifikasi, yaitu Decision Tree, K-Nearest Neighbor, Naive Bayes, dan Support Vector Machine digunakan untuk mengevaluasi kinerja dari hasil seleksi fitur. Hasil uji coba modifikasi PFRS menunjukkan bahwa penurunan akurasi klasifikasi terkecil dan kenaikan akurasi klasifikasi terbesar didapatkan pada penggunaan fitur data sebesar 50% pada semua set data dan semua metode klasifikasi. Selain itu, hasil uji coba seleksi fitur yang mempertimbangkan nilai korelasi antar fitur menunjukkan hasil akurasi klasifikasi terbaik pada penggunaan fitur data sebesar 50% pada semua set data dan semua jenis klasifikasi yang digunakan dalam uji coba.
=======================================================================================================
Proportional Feature Rough Selector (PFRS) is a feature selection method developed based on Rough Set Theory (RST). The development was detailing the division of the area in the data set into several important parts, i.e., the lower approximation, upper approximation, and boundary region. PFRS utilizes boundary regions to find smaller areas, namely, Member Sections (MS) and Non-Member Sections (NMS). Yet, PFRS is still only used in feature selection in binary classification with text data types. PFRS also developed without considering the correlation between features. This study aims to modify the PFRS so it can be applied to multi-label classification with mixed data containing both text and non-text data. Apart from this, PFRS was modified by considering the correlation among features of the data. Modifications were made in such a way that such important parts of the data as lower approximation, upper approximation, and determination of the most optimal gap value, were obtained. The correlations among features are also considered during the feature selection process to improve accuracy results. The modified results of the PFRS were tested using four public data sets, each of which has a plural class label, i.e., Reuters News Dataset, 515k Hotel Reviews, Netflix TV Shows, and Ted Talks. To evaluate the performance of the feature selection results, such four classification methods as Decision Tree, K-Nearest Neighbor, Naive Bayes, and Support Vector Machine are employed. The experimental results showed that the minimum decrease and the maximal increase in classification accuracy were obtained from the use of data features by 50%. In addition, the results of feature selection trials that consider the correlation values among features showed the best classification accuracy results on the use of data features by 50%.

Item Type: Thesis (Masters)
Uncontrolled Keywords: klasifikasi multi-label, seleksi fitur teori Rough Set, korelasi antar atribut, Proportional Feature Rough Selector, multi-label classification, rough set theory, correlation between features, proportional feature rough selector
Subjects: Q Science > Q Science (General) > Q325.5 Machine learning.
Q Science > QA Mathematics > QA76.9.D343 Data mining. Querying (Computer science)
T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information System > 59101-(S2) Master Thesis
Depositing User: Vilat Sasax MPP
Date Deposited: 23 Aug 2021 02:00
Last Modified: 23 Aug 2021 02:00
URI: http://repository.its.ac.id/id/eprint/88605

Actions (login required)

View Item View Item