Analisis Variabel Penentu Pada Prediksi Kelulusan Tepat Waktu Mahasiswa Program Sarjana Berdasarkan Performa Empat Semester Pertama

Heriqbaldi, Hemakesha Ramadhani (2024) Analisis Variabel Penentu Pada Prediksi Kelulusan Tepat Waktu Mahasiswa Program Sarjana Berdasarkan Performa Empat Semester Pertama. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5025201209_Undergraduate_Thesis.pdf] Text
5025201209_Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only

Download (2MB) | Request a copy

Abstract

Kelulusan tepat waktu mahasiswa adalah faktor krusial bagi mahasiswa dan universitas. Ketepatan waktu kelulusan merupakan salah satu tolak ukur terpenting bagi universitas serta menjadi indikator penting dalam mengukur efektivitas program universitas. Hal ini mendorong Universitas Y, khususnya di Fakultas X, untuk melakukan evaluasi akademik tiga kali selama delapan semester guna membantu mahasiswa lulus tepat waktu. Namun, ketepatan waktu kelulusan mahasiswa hanya bisa diamati pada evaluasi akhir semester delapan, menyebabkan kesulitan dalam melakukan intervensi dini untuk membantu mahasiswa secara akademik. Penelitian ini berfokus pada prediksi ketepatan waktu kelulusan mahasiswa serta penentuan variabel terbaik bagi model prediksi guna menghemat waktu evaluasi dan membantu penanganan dini akademik bagi mahasiswa yang membutuhkan.
Penelitian ini mengadopsi metodologi CRISP-DM (Cross Industry Standard Process for Data Mining) untuk mendekati masalah secara sistematis. Tiga dataset digunakan sebagai input prediksi kelulusan, yaitu dataset wisuda mahasiswa, dataset IPS mahasiswa, serta dataset absensi mahasiswa. Proses penelitian melibatkan pemahaman bisnis dan data, persiapan data, pemodelan, evaluasi, dan deployment. Untuk seleksi fitur, model XGBoost diinterpretasikan menggunakan nilai SHAP (SHapley Additive exPlanations) untuk mengetahui pengaruh fitur dan skor feature importance untuk mengetahui tingkat kepentingan fitur. Output dari proses ini meliputi grafik nilai SHAP, grafik feature importance, serta performa prediksi XGBoost.
Evaluasi dilakukan dengan membandingkan grafik nilai SHAP dan grafik feature importance sebelum dan sesudah fitur terseleksi. Selanjutnya, performa prediksi XGBoost dengan fitur terpilih berdasarkan nilai SHAP dibandingkan dengan performa prediksi XGBoost dengan fitur terpilih berdasarkan nilai feature importance. Evaluasi ini bertujuan untuk mengidentifikasi metode pemilihan fitur yang paling efektif. Hasil menunjukkan bahwa performa prediksi XGBoost dengan fitur terpilih berdasarkan nilai SHAP memiliki performa tertinggi, dengan kenaikan akurasi sebesar 1.17% dibandingkan dengan prediksi XGBoost tanpa pemilihan fitur. Metode SHAP juga unggul dalam interpretasi fitur dibandingkan metode feature importance, dimana SHAP dapat menjelaskan pengaruh fitur pada setiap case dataset test, sedangkan feature importance hanya menjelaskan kepentingan fitur secara keseluruhan. Dengan demikian, dapat disimpulkan bahwa SHAP merupakan metode pemilihan fitur yang efektif, karena kemampuannya memberikan informasi detail mengenai pengaruh setiap fitur terhadap prediksi model melalui nilai SHAP dan visualisasi plot BeeSwarm, menjadikannya alat yang sangat berguna dalam analisis dan optimasi model prediktif.
===========================================================================================
Students' on-time graduation is a crucial factor for both students and universities. Timeliness of graduation is one of the most important benchmarks for universities and an important indicator in measuring the effectiveness of university programs. This has prompted Y University, particularly Faculty X, to conduct academic evaluations three times over eight semesters to help students graduate on time. However, the timeliness of students' graduation can only be observed in the final evaluation of the eighth semester, making it difficult to intervene early to help students academically. This research focuses on predicting the timeliness of students' graduation and determining the best variables for the prediction model to save evaluation time and help early academic treatment for students in need.
This research adopts the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology to approach the problem systematically. Three datasets are used as input for graduation prediction, namely student graduation dataset, student social studies dataset, and student attendance dataset. The research process involves business and data understanding, data preparation, modeling, evaluation, and deployment. For feature selection, the XGBoost model is interpreted using SHAP (SHapley Additive exPlanations) values to determine the influence of features and feature importance scores to determine the importance of features. The output of this process includes SHAP value graph, feature importance graph, and XGBoost prediction performance.
The evaluation is done by comparing the SHAP value graph and the feature importance graph before and after the selected features. Furthermore, the performance of XGBoost prediction with selected features based on SHAP value is compared with the performance of XGBoost prediction with selected features based on feature importance value. This evaluation aims to identify the most effective feature selection method. The results show that the performance of XGBoost prediction with selected features based on SHAP value has the highest performance, with an accuracy increase of 1.17% compared to XGBoost prediction without feature selection. The SHAP method is also superior in feature interpretation compared to the feature importance method, where SHAP can explain the effect of features on each test dataset case, while feature importance only explains the overall importance of features. Thus, it can be concluded that SHAP is an effective feature selection method, due to its ability to provide detailed information about the influence of each feature on model predictions through SHAP values and BeeSwarm plot visualization, making it a very useful tool in predictive model analysis and optimization.

Item Type: Thesis (Other)
Uncontrolled Keywords: Klasifikasi, Model, Machine Learning, XGBoost, SHAP, Feature Importance, Random Forest, Kelulusan, Mahasiswa, Classification, Model, Machine Learning, XGBoost, SHAP, Feature Importance, Random Forest, Graduation, Student
Subjects: L Education > L Education (General)
T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Information Technology > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Heriqbaldi Hemakesha Ramadhani
Date Deposited: 01 Aug 2024 07:41
Last Modified: 01 Aug 2024 07:41
URI: http://repository.its.ac.id/id/eprint/110568

Actions (login required)

View Item View Item