Analisis Metode Dark Web Scraping Dan Algoritma Topic Modelling Guna Mendukung Medical Intelligence

Widiastara, Prananda Nur (2023) Analisis Metode Dark Web Scraping Dan Algoritma Topic Modelling Guna Mendukung Medical Intelligence. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 05311940000048-Undergraduate_Thesis.pdf] Text
05311940000048-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 April 2025.

Download (1MB) | Request a copy

Abstract

Badan Intelijen Negara (BIN) terus mengembangkan kapasitas dan kapabilitas medical intelligence-nya. Kebijakan strategis ini diambil Kepala BIN (Kabin) Jend Pol (Purn) Budi Gunawan tidak semata karena tuntutan urgensi saat ini, namun lebih jauh untuk kesiapan menghadapi ancaman mendatang. Pengembangan medical intelligence BIN, diarahkan untuk menjadi terdepan dalam database bioinformatika, kapabel dalam cegah dan deteksi dini. MedISys eropa telah dikembangkan dengan pengambilan data melalui banyak sumber berita di internet, khususnya di surface web. Dalam jurnal MedISys mengatakan bahwa lebih banyak data dari berbagai sumber dapat ditambahkan untuk melengkapi surveilans. Akibat dari kondisi tersebut, dalam tugas akhir ini penulis akan melakukan penelitian mengenai analisis algoritma topic modelling dan metode dark web scraping guna mendukung pengembangan medical intelligence. Terdapat dua metode yang akan digunakan oleh penulis dalam melakukan dark web scraping, yaitu metode akses dark web menggunakan TOR network dan I2P network yang selanjutnya digunakan teknik HTML parser untuk mengambil data dari web. Terdapat dua algoritma topic modelling yang dipilih, antara lain adalah LDA dan BERTopic. Tahap pertama dalam penelitian ini adalah pengumpulan data scraping dan open-source, tahap kedua evaluasi metode dark web scraping, tahap ketiga pra-proses data, tahap keempat percobaan algoritma topic modelling, tahap kelima adalah evaluasi algoritma topic modelling, dan tahap keenam adalah implementasi medical intelligence. Pada tahap pengumpulan, dark web scraping berhasil mengumpulkan 518 data berupa teks artikel. Melalui tahap evaluasi dark web scraping, dark web scraping dengan TOR network memiliki nilai metrik terbaik. Tahap percobaan topic modelling, dilakukan training model dengan dataset terbesar dan BERTopic menghasilkan nilai metrik terbaik dengan topic coherence sebesar 0,1591 dan topic diversity sebesar 0,8326. Sedangkan LDA memiliki topic coherence sebesar 0,0581 dan topic diversity sebesar 0,7488. Pada tahap akhir, peneliti telah mengimplementasikan medical intelligence menggunakan dark web scraping dan topic modelling
======================================================================================================================================
The State Intelligence Agency (BIN) continues to develop its medical intelligence capacity and capability. This strategic policy was taken by the Head of BIN (Kabin) General Pol (Purn) Budi Gunawan not only because of the demands of the current urgency, but also for readiness to face future threats. The development of BIN's medical intelligence is directed at being at the forefront of bioinformatics databases, capable of prevention and early detection. MedISys Europe has been developed by collecting data from many news sources on the internet, especially on the surface web. In the journal MedISys said that more data from various sources could be added to complete the surveillance. As a result of these conditions, in this final project the author will conduct research on the analysis of topic modelling algorithms and dark web scraping methods to support the development of medical intelligence. There are two methods that will be used by the author in carrying out dark web scraping, namely the dark web access method using the TOR network and the I2P network which then uses the HTML parser technique to retrieve data from the web. There are two topic modelling algorithms selected, including LDA and BERTopic. The first stage in this research is scraping and open-source data collection, the detik stage is evaluation of the dark web scraping method, the third stage is data pre-processing, the fourth stage is the topic modelling algorithm experiment, the fifth stage is evaluation of the topic modelling algorithm, and the sixth stage is medical implementation. intelligence. At the collection stage, dark web scraping managed to collect 518 data in the form of article text. Through the dark web scraping evaluation stage, dark web scraping with the TOR network has the best metric value. In the topic modelling trial phase, a training model with the largest dataset was carried out and BERTopic produced the best metric value with topic coherence of 0.1591 and topic diversity of 0.8326. While LDA has topic coherence of 0.0581 and topic diversity of 0.7488. In the final stage, researchers have implemented medical intelligence using dark web scraping and topic modelling

Item Type: Thesis (Other)
Uncontrolled Keywords: Medical Intelligence, Dark Web Scraping, Topic Modelling, BERTopic.
Subjects: T Technology > T Technology (General) > T57.5 Data Processing
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Information Technology > 59201-(S1) Undergraduate Thesis
Depositing User: Prananda Nur Widiastara
Date Deposited: 07 Feb 2023 02:20
Last Modified: 07 Feb 2023 02:20
URI: http://repository.its.ac.id/id/eprint/96322

Actions (login required)

View Item View Item