COMBINATION INSIDE AND OUTSIDE-FOLD OVERSAMPLING IN RANDOM FOREST AND BINARY LOGISTICS REGRESSION TO ANALYSE PT “X”'S IMBALANCE LEADS CLASSIFICATION

MARGIRIZKI, SABILAH (2020) COMBINATION INSIDE AND OUTSIDE-FOLD OVERSAMPLING IN RANDOM FOREST AND BINARY LOGISTICS REGRESSION TO ANALYSE PT “X”'S IMBALANCE LEADS CLASSIFICATION. Undergraduate thesis, Institut Teknologi Sepuluh Nopember.

[img] Text
06211640000113-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Software as a Service (SaaS) is a cloud computing software service. Dynamic competition among start-ups builds the managers to be able to manage existing and future customers. This study focuses on the case of PT “X” which is a business-to-business company in the SaaS field by analyzing and predicting the characteristics of PT “X”'s leads in successful and failed transactions. The data used in this study is PT “X” leads data from 2018 to 2019. This research uses the Random Forest (RF) and Binary Logistic Regression (RL) method combined with oversampling inside (OIF) and outside fold (OOF). In general, the combination of OOF in RF and RL methods results in higher classification performance compared to the RF-OIF and RLOIF combination methods. The average values of AUC, g-mean, and sensitivity of the RF-OOF method are 90.63%, 75.59%, and 82.50%. Meanwhile, the RL-OOF method gives higher results, which are 92.32%, 85.56%, and 88.02% with an average increase of 16.44%, 47.99% and 65.86% of imbalanced data. Significant variables of RLOOF results are Special Project, Industry, IntroduceMonth, TeamLeader, and Source. The most significant odds ratio is in the Industry10 category, which has an 11 times greater tendency to fail transactions than Industry1.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Imbalance, Oversampling, Random Forest, Regresi Logistik Biner, SaaS, Binary Logistic Regression
Subjects: Q Science > QA Mathematics > QA76.9.D343 Data mining
Divisions: Faculty of Science and Data Analytics (SCIENTICS) > Statistics > 49201-(S1) Undergraduate Thesis
Depositing User: Sabilah Margirizki
Date Deposited: 31 Aug 2020 03:25
Last Modified: 31 Aug 2020 03:25
URI: http://repository.its.ac.id/id/eprint/81380

Actions (login required)

View Item View Item