PENGELOMPOKAN BIG DOCUMENT MENGGUNAKAN METODE K MEANS PADA KOMPUTASI TERDISTRIBUSI

HIKMAT, ILHAM LAEUR (2016) PENGELOMPOKAN BIG DOCUMENT MENGGUNAKAN METODE K MEANS PADA KOMPUTASI TERDISTRIBUSI. Undergraduate thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 2211100109-undergrduate thesis.pdf]

Text
2211100109-undergrduate thesis.pdf - Published Version
Restricted to Repository staff only
Download (1MB)

Abstract

Artikel dan konten berita adalah bentuk dokumen yang laju per-
tambahannya pesat berkat kemudahan penggunaan dan akses in-
ternet. Pengelompokan data (data clustering) adalah metode pem-
belajaran mesin tanpa pengawasan, yang dapat membagi kumpu-
lan data kedalam sub kelompok berdasarkan kemiripan karakteris-
tik data. Dalam penelitian ini akan diimplementasikan pengelom-
pokan data berupa konten berita dengan jumlah besar menggu-
nakan metode K-Means pada sistem komputasi terdistribusi. Sis-
tem yang didesain terdiri dari dua subsistem, praposes data yang
berfungsi untuk mecari �tur dari teks berita dan subsistem pen-
gelompokan data yang berfungsi untuk membagi kelompok data.
Fungsi-fungsi dalam sistem dibuat dengan model program MapRe-
duce dan dijalankan pada cluster komputer berbasis Hadoop. Dari
pengujian yang dilakukan diperoleh akurasi hasil pengelompokan
lebih dari 83% dengan data yang kategorinya sudah ditentukan.
Waktu proses sistem juga mendapat peningkatan 20% dengan mem-
perbanyak jumlah slave node pada sistem sejumlah 25%-50%.
"============================================================================================="
Articles and news is a document that has rapid growth rate thanks
to the ease of use of internet. Data clustering is an unsupervised
machine learning method, which can split a set of data into sub-
groups based on similar characteristics of the data. In this research
will be implemented data clustering for a large number of news us-
ing K-Means method on a distributed computing system. Designed
systems has two subsystems, data preprocess that is used to look for
the features of the news and data clustering system that serves to di-
vide the data into groups. The functions in the system created with
MapReduce programming model and run on Hadoop based computer
cluster. From the tests, the accuracy of the results obtained by this
system is over 83% with the data collected from news website with
speci�ed category. Faster processing time about 20% can be achieved
with increasing the number of slave node 25%-50%.

Item Type:	Thesis (Undergraduate)
Additional Information:	RSE 006.312 Hik p
Uncontrolled Keywords:	big data, distributed computation, hadoop, k-means clustering, text mining
Subjects:	T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5105 Data Transmission Systems
Divisions:	Faculty of Industrial Technology > Electrical Engineering > 20201-(S1) Undergraduate Thesis
Depositing User:	Users 13 not found.
Date Deposited:	13 Jan 2017 07:30
Last Modified:	04 Sep 2025 01:30
URI:	http://repository.its.ac.id/id/eprint/1533

Actions (login required)

View Item