Srimarinda, Reidina Dian (2025) Regresi Cox dengan Adaptive Elastic Net pada Data Berdimensi Tinggi: Studi Kasus Data Observasi Klinis dan Transkriptomik Pasien Adenokarsinoma Paru. Other thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
5003211039-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only until 1 April 2027. Download (5MB) | Request a copy |
Abstract
Kanker paru merupakan penyebab utama kematian akibat kanker di dunia sehingga memastikan ketahanan hidup pasien menjadi hal yang sangat penting dalam menentukan perawatan yang tepat. Penelitian ini memanfaatkan data klinis dan transkriptomik pasien adenokarsinoma paru untuk memberikan wawasan pada tingkat genomik. Data transkriptomik sering kali mengalami masalah kolinearitas dan berdimensi tinggi (p≫n) yang membuat regresi Cox proportional hazard standar tidak optimal. Oleh karena itu, penelitian ini menggunakan regresi Cox-adaptive elastic net (Cox-AENET) yang menggabungkan penalti l_1 LASSO adaptif dan l_2 ridge untuk menyeleksi variabel secara adaptif dan mengurangi bias pada koefisien variabel signifikan. Penelitian ini diawali dengan preprocessing data, menggabungkan variabel klinis dan transkriptomik menjadi satu data frame sehingga menghasilkan 79 sampel pasien dari 96 pasien dan menyisakan 3006 dari 7915 variabel ekspresi gen. Eksplorasi data klinis menunjukkan bahwa stadium tumor merupakan faktor klinis paling signifikan dengan ketahanan pasien stadium I lebih baik dibandingkan stadium III. Pada data transkriptomik, metode clustering K-means menghasilkan dua klaster optimal. Meskipun hasil analisis Kaplan-Meier dan uji log-rank tidak menunjukkan perbedaan signifikan dalam probabilitas survival antar klaster, ekspresi gen memiliki potensi penting untuk dianalisis lebih lanjut karena dapat mencerminkan mekanisme penyakit, respons terapi, dan karakteristik pasien yang tidak terdeteksi melalui observasi klinis. Pemodelan Cox-AENET dengan 3013 variabel prediktor menghasilkan model terbaik dengan C-index 1 dan 30 variabel signifikan di mana variabel klinis stadium tumor memiliki kontribusi terbesar dengan hazard ratio 5,345. Sementara itu, variabel transkriptomik, ekspresi gen ILVBL memiliki hazard ratio tertinggi sebesar 3,007 dan OAZ1 memiliki hazard ratio terendah sebesar 0,191. Penelitian ini diharapkan dapat membantu tenaga medis mengoptimalkan perawatan pasien adenokarsinoma paru dan memberikan wawasan baru terkait mekanisme molekuler penyakit.
========================================================================================================================
Lung cancer is the leading cause of cancer death in the world, so ensuring patient survival is critical in determining appropriate treatment. This study utilized clinical and transcriptomic data of lung adenocarcinoma patients to provide insights at the genomic level. Transcriptomic data often suffer from collinearity and high dimensionality (p≫n) issues that make standard Cox proportional hazard regression suboptimal. Therefore, this study used Cox-adaptive elastic net (Cox-AENET) regression that combines adaptive l_1 LASSO and l_2 ridge penalties to adaptively select variables and reduce bias in significant variable coefficients. This study started with data preprocessing, combining clinical and transcriptomic variables into one data frame resulting in 79 patient samples out of 96 patients and leaving 3006 out of 7915 gene expression variables. Exploration of clinical data showed that tumor stage was the most significant clinical factor with better survival of stage I patients compared to stage III. On transcriptomic data, the K-means clustering method resulted in two optimal clusters. Although the results of Kaplan-Meier analysis and log-rank test did not show significant differences in survival probability between clusters, gene expression is potentially important for further analysis as it may reflect disease mechanisms, therapy response, and patient characteristics not detected through clinical observation. Cox-AENET modeling with 3013 predictor variables resulted in the best model with a C-index of 1 and 30 significant variables in which the clinical variable tumor stage had the largest contribution with a hazard ratio of 5.345. Meanwhile, transcriptomic variables, ILVBL gene expression had the highest hazard ratio of 3.007 and OAZ1 had the lowest hazard ratio of 0.191. This study is expected to help medical personnel optimize the treatment of lung adenocarcinoma patients and provide new insights related to the molecular mechanism of the disease.
Item Type: | Thesis (Other) |
---|---|
Uncontrolled Keywords: | Adaptive Elastic Net, Cox Regression, Gene Expression, Lung Adenocarcinoma, Adaptive Elastic Net, Adenokarsinoma Paru, Ekspresi Gen, Regresi Cox |
Subjects: | H Social Sciences > HA Statistics > HA31.7 Estimation |
Divisions: | Faculty of Science and Data Analytics (SCIENTICS) > Statistics > 49201-(S1) Undergraduate Thesis |
Depositing User: | Reidina Dian Srimarinda |
Date Deposited: | 27 Jan 2025 22:25 |
Last Modified: | 27 Jan 2025 22:25 |
URI: | http://repository.its.ac.id/id/eprint/116995 |
Actions (login required)
![]() |
View Item |