Aula, Bryan Khufa Rahmada (2021) Sistem Rekomendasi pada Forum Kesehatan dengan Pemeringkatan Pertanyaan Serupa Menggunakan Pendekatan Deep Learning. Undergraduate thesis, Institut Teknologi Sepuluh Nopember.
Text
05111740000071-Undergraduate_Thesis.pdf - Accepted Version Restricted to Repository staff only until 1 October 2023. Download (3MB) | Request a copy |
Abstract
Perkembangan dalam sistem information retrieval semakin memudahkan masyarakat untuk dapat mencari informasinya sendiri secara daring. Information retrieval system sudah umum ditemui pada kebanyakan aplikasi, namun masih jarang yang berada pada domain kesehatan. Pertanyaan kesehatan sangat mementingkan konteks dan semantik kalimatnya. Aspek yang penting dari information retrieval umumnya terdiri dari pembuatan representasi query dan dokumen, metode retrieval, dan pemeringkatan ulang dokumen. Apabila ada satu tahap yang gagal, maka hasil akhir pencariannya menjadi tidak maksimal.
Pada Tugas Akhir ini akan dibuat model untuk representasi embedding kalimat. Embedding kalimat nantinya digunakan dalam proses retrieval yang digunakan yaitu Semantic search. Semantic search bekerja dengan cara menghitung vektor kalimat query yang berdekatan pada korpus dokumen menggunakan algoritma perhitungan jarak. Selanjutnya dokumen kandidat yang didapat dari proses sebelumnya diperingkatkan ulang dengan Cross-encoder yang menerima pasangan kalimat query dan dokumen kandidat, kemudian dikeluarkan nilai relevansinya.
Pengujian dilakukan pada algoritma information retrieval BM25 dengan Semantic search, algoritma perhitungan jarak Cosine similarity dengan Dot product, dan model deep learning untuk pemeringkatan Cross-encoder dengan Siamese LSTM. Hasil yang optimal didapatkan menggunakan Semantic search, perhitungan jarak Cosine similarity, dan diperingkatkan ulang dengan Cross-encoder. Menghasilkan rata-rata Precision@5 0.509 dan Mean Average Precision sebesar 0.873, dengan waktu eksekusi 0.4 detik. Selain metrik pada umumnya, waktu eksekusi juga merupakan hal yang harus diperhatikan dalam sistem information retrieval.
=====================================================================================================
Developments in the information retrieval system make it easier for people to be able to find their own information online. Information retrieval systems are common in most applications, but are still rare in the medical domain. Medical questions attach great importance to the context and semantics of the sentence. Important aspects of information retrieval generally consist of creating query and document representations, retrieval methods, and document re-ranking. If there is one stage that fails, then the final result of the search will not be optimal.
In this final project, a model for the representation of sentence embedding will be made. Embedding sentences will be used in the retrieval process, namely Semantic search. Semantic search works by calculating the adjacent query sentence vectors in the document corpus using a distance measure algorithm. Furthermore, the candidate documents obtained from the previous process are re-ranked with a Cross-encoder that accepts the query sentence pairs and candidate documents, then the relevance value is issued.
The tests were carried out on the BM25 information retrieval algorithm against Semantic search, the Cosine similarity distance calculation algorithm against the Dot product, and the deep learning model for Cross-encoder ranking against Siamese LSTM. Optimal results are obtained using Semantic search, calculating Cosine similarity distance, and re-ranking with Cross- encoder. Produces an average Precision@5 of 0.509 and a Mean Average Precision of 0.873, with an execution time of 0.4 seconds. In addition to general metrics, execution time is also something that must be considered in an information retrieval system.
Item Type: | Thesis (Undergraduate) |
---|---|
Uncontrolled Keywords: | Cross-Encoder, Deep Learning, Information Retrieval, Semantic search, Sentence Embedding, Siamese LSTM |
Subjects: | Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines. Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science) |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
Depositing User: | Bryan Khufa Rahmada Aula |
Date Deposited: | 14 Aug 2021 05:15 |
Last Modified: | 14 Aug 2021 05:16 |
URI: | http://repository.its.ac.id/id/eprint/86429 |
Actions (login required)
View Item |