Simanjuntak, David Fischer (2025) Perbandingan Kinerja Word Embedding dan Pemodelan Topik Dalam Identifikasi Topik Tugas Akhir Berdasarkan Judul dan Abstrak Penelitian. Other thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
5025201123-Undergraduate_Thesis.pdf Restricted to Repository staff only until 1 April 2027. Download (1MB) | Request a copy |
Abstract
Identifikasi Topik Tugas Akhir memegang peranan penting dalam membantu mahasiswa menemukan proyek penelitian yang sesuai dengan minat dan kemampuan akademik mahasiswa. Penelitian ini membandingkan kinerja dari tiga metode Word Embedding, yaitu Word2Vec, GloVe, dan FastText dengan tiga metode pemodelan topik, yaitu Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF), dan Latent Dirichlet Allocation (LDA) untuk mengidentifikasi topik Tugas Akhir. Untuk mengevaluasi kinerja metode-metode tersebut menggunakan metrik Coherence Score dan Cosine Similarity. Data yang digunakan mencakup judul dan abstrak dari 301 Tugas Akhir mahasiswa Departemen Teknik Informatika dari suatu universitas di Surabaya dari tahun 2016-2021. Hasil penelitian menunjukkan bahwa pada dataset judul kombinasi GloVe dan LDA memberikan kinerja terbaik, dengan nilai Coherence Score sebesar 0.74082 dan nilai Cosine Similarity sebesar 0.243196. Sedangkan untuk dataset abstrak kombinasi GloVe dan LDA juga memberikan hasil yang baik dengan nilai Coherence Score sebesar 0.6322 dan nilai Word Similarity -0.162095631. Kombinasi GloVe dan LDA pada dataset judul mampu menghasilkan topik yang optimal dan kata-kata yang relevan, beragam, dan interpretatif.
=================================================================================================================================
The identification of Final Project topics plays a crucial role in assisting students in finding research projects that align with their interests and skills. This study compares the performance of three Word Embedding methods, namely Word2Vec, GloVe, and FastText, with three topic modeling methods, namely Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF), and Latent Dirichlet Allocation (LDA), to identify Final Project topics. Performance evaluation is conducted using metrics such as Coherence Score and Word Similarity. The data used includes titles and abstracts from 301 Final Projects of students from the Departement of Informatics Engineering, university in Surabaya, spanning from 2016 to 2021. The results of this research provide depth insights into the strengths and weaknesses of each method in the context of Final Project topic identification. The research results show that for the title dataset, the combination of GloVe and LDA provides the best performance, with a Coherence Score of 0.74082, a Umass score of -2.19309, and a Cosine Similarity score of 0.243196. Meanwhile, for the abstract dataset, the combination of GloVe and LDA also delivers good results, with a Coherence Score of 0.6322 and a Word Similarity score of -0.162095631. The combination of GloVe and LDA on the title dataset successfully generates optimal topics with relevant, diverse, and interpretable words.
Item Type: | Thesis (Other) |
---|---|
Uncontrolled Keywords: | Identifikasi Topik, Word Embedding, Pemodelan Topik, LSA, LDA, NMF, Topic Identification, Word Embedding, Topic Modeling, LSA, LDA, NMF |
Subjects: | T Technology > T Technology (General) > T57.5 Data Processing |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
Depositing User: | David Fischer Simanjuntak |
Date Deposited: | 04 Feb 2025 06:54 |
Last Modified: | 04 Feb 2025 06:54 |
URI: | http://repository.its.ac.id/id/eprint/118124 |
Actions (login required)
![]() |
View Item |