Al Azizi, Nurus Shofy (2026) Regresi Cox Adaptive Elastic Net dengan Stratified Cross Validation pada Pemodelan Survival pada Data Berdimensi Tinggi. Other thesis, Institut Teknologi Sepuluh Nopember.
|
Text
5003221062-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only Download (5MB) | Request a copy |
Abstract
Prediksi waktu hingga suatu peristiwa (time-to-event analysis) merupakan aspek penting dalam bidang medis, terutama dalam memperkirakan ketahanan hidup pasien kanker payudara apabila terjadi perubahan kondisi atau adanya intervensi pada kovariat. Penggabungan data klinis dan ekspresi gen memungkinkan pemahaman yang lebih komprehensif, tetapi menghasilkan dataset berdimensi tinggi (p≫n) yang rawan overfitting, multikolinearitas, dan ketidakstabilan model saat menggunakan pemodelan Cox-PH konvensional. Untuk mengatasi permasalahan tersebut, pendekatan regularisasi dapat digunakan sebagai alternatif yang efektif. Regularisasi memungkinkan pembentukan model yang lebih parsimoni dengan menekan kompleksitas model sekaligus menyeleksi prediktor yang relevan melalui mekanisme penyusutan koefisien. Dalam penelitian ini, digunakan metode Cox adaptive alastic net (Cox-AENET) yang dipadukan dengan stratified cross-validation (SCV) untuk menjaga keseimbangan proporsi data event dan censored antar fold pada kondisi censor-imbalance dan mencegah terjadinya randomisasi pada model. Dataset yang digunakan terdiri dari 295 pasien dengan 14.088 variabel hasil integrasi data klinis dan transkriptomik. Eksplorasi data klinis mengidentifikasi enam variabel yang signifikan, sedangkan eksplorasi data ekspresi gen dilakukan menggunakan metode K-Means dan menghasilkan tiga klaster optimal. Selanjutnya, pemodelan survival dilakukan menggunakan Cox adaptive elastic net dengan teknik stratified cross-validation (SCV). Hasil analisis menunjukkan bahwa penggunaan SCV, baik pada skema 5 fold maupun 10 fold, menghasilkan struktur model yang lebih stabil dibandingkan cross-validation konvensional, yang tercermin dari konsistensi pemilihan parameter optimal γ dan λ pada 100 kali replikasi. Model terbaik diperoleh pada γ=0,05 dan λ=1,951, dengan 174 variabel non-zero serta kinerja prediktif yang tinggi (C-index = 0,933). Invasi vaskular merupakan satu-satunya variabel klinis yang meningkatkan risiko kematian (HR = 1,279), sementara SLC36A1 dan N6AMT2 masing-masing teridentifikasi sebagai gen dengan risiko tertinggi dan efek protektif terkuat.
===================================================================================================================================
Time-to-event analysis is a critical aspect in the medical field, particularly for estimating the survival of breast cancer patients when changes in conditions or interventions on covariates are present. Integrating clinical data with gene expression enables a more comprehensive understanding, but it generates high-dimensional datasets (p≫n) that are prone to overfitting, multicollinearity, and instability when applying conventional Cox proportional hazards (Cox-PH) models. Regularization methods offer an effective solution by reducing model complexity while selecting relevant predictors through coefficient shrinkage. In this study, Cox adaptive elastic net (Cox-AENET) was applied to high-dimensional survival data. These models were combined with stratified cross-validation (SCV) to preserve representative proportions of event and censored observations across folds under censor-imbalanced conditions and to reduce randomization effects during model estimation. The dataset comprised 295 patients with 14,088 variables resulting from the integration of clinical and transcriptomic data. Clinical data exploration identified six significant variables, while gene expression patterns were explored using K-means clustering, yielding three optimal clusters. Survival modeling using Cox-AENET with SCV (5-fold and 10-fold) produced more stable model structures than conventional cross-validation, as evidenced by consistent selection of optimal γ and λ across 100 replications. The best model was obtained at γ=0.05 and λ=1.951, identifying 174 non-zero variables and achieving high predictive performance (C-index = 0.933). Vascular invasion was the only clinical variable associated with increased mortality risk (hazard ratio = 1.279), while SLC36A1 and N6AMT2 were identified as the strongest risk and protective genes, respectively.
| Item Type: | Thesis (Other) |
|---|---|
| Uncontrolled Keywords: | Adaptive-elastic net, Censor-Imbalanced, Data Dimensi Tinggi, Stratified Cross-validation Adaptive Elastic Net, Censor-Imbalanced, High-Dimensional Data, Stratified Cross-Validation |
| Subjects: | Q Science > QA Mathematics > QA76.9D338 Data integration Q Science > QH Biology > QH426 Genetics |
| Divisions: | Faculty of Mathematics, Computation, and Data Science > Statistics > 49201-(S1) Undergraduate Thesis |
| Depositing User: | Nurus Shofy Al Azizi |
| Date Deposited: | 29 Jan 2026 01:10 |
| Last Modified: | 29 Jan 2026 01:10 |
| URI: | http://repository.its.ac.id/id/eprint/131130 |
Actions (login required)
![]() |
View Item |
