Rheisa, Calista Fara (2026) Identifikasi Differentially Expressed Genes (DEGs) Menggunakan limma-voom dan edgeR pada Data Cacahan RNA-Seq (Studi Kasus: Data Ekspresi Gen pada Pasien Kanker Kolorektal). Other thesis, Institut Teknologi Sepuluh Nopember.
|
Text
5003221020-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only Download (3MB) | Request a copy |
Abstract
Metode statistika memegang peranan penting dalam identifikasi DEGs menggunakan data RNA-Sequencing (RNA-Seq) untuk mengidentifikasi gen-gen yang menunjukkan perubahan ekspresi signifikan antar kondisi biologis, seperti antara jaringan normal dan kanker. Data RNA-Seq berupa data cacahan yang umumnya menunjukkan overdispersi dan heterogenitas varians, sehingga memerlukan pendekatan statistika yang sesuai. Penelitian ini bertujuan untuk mengkaji serta membandingkan kerangka kerja statistika pada dua metode analisis DE yang banyak digunakan, yaitu limma-voom dan edgeR, dalam mengidentifikasi differentially expressed genes (DEGs) pada data RNA-Seq kanker kolorektal. Data yang dianalisis berasal dari dataset publik GSE104836 pada Gene Expression Omnibus (GEO), yang terdiri atas 10 pasang sampel jaringan kanker kolorektal dan jaringan normal dari pasien yang sama. Alur analisis diawali dengan normalisasi library size menggunakan metode Trimmed Mean of M-values untuk memperoleh effective library size sehingga perbandingan tingkat ekspresi gen antar sampel dapat dilakukan secara akurat. Analisis DE selanjutnya dilakukan secara terpisah menggunakan metode limma-voom dan edgeR dengan ambang signifikansi false discovery rate (FDR) sebesar 0,05 dan kriteria |log2 fold change| ≥ 1. Metode limma-voom mengidentifikasi 4.202 DEGs melalui transformasi log-CPM, pemodelan hubungan mean–variance berbasis pembobotan presisi, serta penerapan moderated t-test dalam kerangka model linier, sehingga menghasilkan estimasi perubahan ekspresi gen yang stabil dan konservatif. Sebaliknya, metode edgeR mengidentifikasi 4.901 DEGs dengan memodelkan data cacahan secara langsung menggunakan distribusi binomial negatif, estimasi dispersi berbasis empirical Bayes, dan pengujian menggunakan likelihood ratio test, yang menunjukkan sensitivitas lebih tinggi terhadap perubahan ekspresi gen. Perbedaan asumsi dan pemodelan statistik tersebut menghasilkan variasi dalam jumlah dan tingkat signifikansi DEGs. Temuan ini menegaskan pentingnya pemahaman terhadap pendekatan statistika masing-masing metode dalam analisis DE untuk menjamin interpretasi biologis yang robust pada studi RNA-Seq kanker.
====================================================================================================================================
Statistical methods play a central role in identification of DEGs using RNA-Sequencing (RNA-Seq) data to identify genes that exhibit significant expression changes across biological conditions, such as between normal and cancer tissues. RNA-Seq data are count data that commonly exhibit overdispersion and variance heterogeneity, necessitating the use of appropriate statistical frameworks. This study aims to examine and compare how two widely used statistical approaches, limma-voom and edgeR, are implemented in the identification of differentially expressed genes (DEGs) from colorectal cancer RNA-Seq data. The dataset analyzed was obtained from the public repository GSE104836 in the Gene Expression Omnibus (GEO) and consists of 10 paired samples of colorectal cancer tissue and matched normal tissue from the same patients. The analysis workflow began with library size normalization using the Trimmed Mean of M-values method to obtain effective library sizes, allowing reliable comparison of gene expression levels across samples. Differential expression analysis was then conducted independently using limma-voom and edgeR with a false discovery rate (FDR) threshold of 0.05 and |log2 fold change| ≥ 1. The limma-voom approach identified 4,202 DEGs by transforming count data to log-CPM values, modeling the mean–variance relationship through precision weighting, and applying moderated t-tests within a linear modeling framework, resulting in stable and conservative estimates of expression changes. In contrast, edgeR identified 4,901 DEGs by directly modeling count data using the negative binomial distribution, estimating gene-wise dispersion via empirical Bayes methods, and performing likelihood ratio tests, which demonstrated higher sensitivity to differential expression. Differences in statistical assumptions and modeling strategies contributed to variations in DEGs numbers and significance levels. These findings underscore the importance of understanding the statistical characteristics of DE methods to ensure robust biological interpretation in RNA-Seq cancer studies.
| Item Type: | Thesis (Other) |
|---|---|
| Uncontrolled Keywords: | Analisis Differential Expression (DE), Differentially Expressed Genes (DEGs), limma-voom, edgeR, RNA-Sequencing |
| Subjects: | H Social Sciences > HA Statistics |
| Divisions: | Faculty of Mathematics and Science > Statistics > 49201-(S1) Undergraduate Thesis |
| Depositing User: | Calista Fara Rheisa |
| Date Deposited: | 29 Jan 2026 07:56 |
| Last Modified: | 29 Jan 2026 07:56 |
| URI: | http://repository.its.ac.id/id/eprint/131060 |
Actions (login required)
![]() |
View Item |
