Implementasi Focused Web Crawling untuk Akuisisi Data pada Situs Web

Putra, Aldo Setyadi (2019) Implementasi Focused Web Crawling untuk Akuisisi Data pada Situs Web. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 06111540000066_Undergraduate_Theses.pdf]

Text
06111540000066_Undergraduate_Theses.pdf
Restricted to Repository staff only
Download (4MB) | Request a copy

Abstract

Internet sekarang memiliki koleksi data yang sangat besar dan terus meningkat. Pencarian data dapat dilakukan dengan dua cara yaitu Scraping dan Crawling. Pengumpulan data dari banyak tautan (URL) atau banyak situs web disebut Web Crawling. Sehingga, Web Crawling cocok digunakan untuk mendapatkan data secara luas dibanding Scraping yang hanya terfokus pada satu situs web tertentu. Salah satu metode Web Crawling adalah Focused Web Crawling. Focused Web Crawling memanfaatkan keyword dalam pencarian dan pengumpulan data. Metode tersebut memainkan peran penting dalam sistem pencarian yang terdistribusi. Keyword atau topik dijadikan objek pencarian pada mesin pencari sehingga didapatkan URL induk (seed URL). Pada URL induk dikoleksi data URL cabang untuk dikunjungi dan dihitung keterkaitan dengan topik. Penghitungan keterkaitan tersebut dilakukan dengan metode Classfication dan Weight Method. Classification digunakan untuk menghitung nilai peluang label pada halaman web. Weight Method digunakan untuk menghitung total nilai peluang keterkaitan dari perkalian peluang label dengan bobot label. Predikat situs web (Good, Not Good) berdasarkan nilai peluang keterkaitan total situs web. Selanjutnya, dilakukan pengunduhan berkas (.pdf) dan pemilihan elemen untuk disiapkan pada data luaran dalam Ms. Excel. Pengujian program dilakukan dengan cara menghitung keberhasilan pengunduhan berkas (.pdf) dari halaman web. Akurasi dalam mengunduh adalah 90.1%.
=================================================================================================================================
The internet now has a huge collection of data and continues to increase. Data search can be done in two ways, namely scraping and crawling. Data collection from many links (URLs) or many websites is called Web Crawling. So, Web Crawling is suitable to be used to get data extensively compared to scraping which only focuses on one particular website. One of the Web Crawling methods is Focused Web Crawling. Focused Web Crawling uses keywords in searching and data collecting. These methods play an important role in distributed search systems. Keyword or topic will be used as search objects on search engines so that the parent URL (seed URL) is obtained. At the parent URL, branch URL data will be collected to be visited and the relationship with the topic is calculated. The linkage calculation is done by Classfication method and Weight Method. Classification is used to calculate label probability values on web pages. The Weight Method is used to calculate the total probability of the linkage from multiplication of probability labels with label weights. Website predicate (Good, Not Good) based on the total linkage probability value of the website. Next, do the file download (.pdf) and selection of elements to be prepared in the output data in Ms. Excel. Program testing is done by calculating the success of downloading files (.pdf) from web pages. The accuracy in downloading is 90,1%.

Item Type:	Thesis (Other)
Additional Information:	RSMa 005.72 Put i-1 2019
Uncontrolled Keywords:	Pengumpulan Data, Web Crawling, Focused Web Crawling
Subjects:	Q Science > QA Mathematics > QA76.9.D343 Data mining. Querying (Computer science)
Divisions:	Faculty of Science and Data Analytics (SCIENTICS) > Chemistry > 47201-(S1) Undergraduate Thesis
Depositing User:	Aldo Setyadi Putra
Date Deposited:	27 Sep 2024 10:33
Last Modified:	20 Jun 2025 07:20
URI:	http://repository.its.ac.id/id/eprint/67007

Actions (login required)

View Item