PENGELOMPOKAN BIG DOCUMENT MENGGUNAKAN METODE K MEANS PADA KOMPUTASI TERDISTRIBUSI

HIKMAT, ILHAM LAEUR (2016) PENGELOMPOKAN BIG DOCUMENT MENGGUNAKAN METODE K MEANS PADA KOMPUTASI TERDISTRIBUSI. Undergraduate thesis, Institut Teknologi Sepuluh Nopember.

[img]
Preview
Text
2211100109-undergrduate thesis.pdf - Published Version

Download (1MB) | Preview

Abstract

Artikel dan konten berita adalah bentuk dokumen yang laju per- tambahannya pesat berkat kemudahan penggunaan dan akses in- ternet. Pengelompokan data (data clustering) adalah metode pem- belajaran mesin tanpa pengawasan, yang dapat membagi kumpu- lan data kedalam sub kelompok berdasarkan kemiripan karakteris- tik data. Dalam penelitian ini akan diimplementasikan pengelom- pokan data berupa konten berita dengan jumlah besar menggu- nakan metode K-Means pada sistem komputasi terdistribusi. Sis- tem yang didesain terdiri dari dua subsistem, praposes data yang berfungsi untuk mecari �tur dari teks berita dan subsistem pen- gelompokan data yang berfungsi untuk membagi kelompok data. Fungsi-fungsi dalam sistem dibuat dengan model program MapRe- duce dan dijalankan pada cluster komputer berbasis Hadoop. Dari pengujian yang dilakukan diperoleh akurasi hasil pengelompokan lebih dari 83% dengan data yang kategorinya sudah ditentukan. Waktu proses sistem juga mendapat peningkatan 20% dengan mem- perbanyak jumlah slave node pada sistem sejumlah 25%-50%. "=============================================================================================" Articles and news is a document that has rapid growth rate thanks to the ease of use of internet. Data clustering is an unsupervised machine learning method, which can split a set of data into sub- groups based on similar characteristics of the data. In this research will be implemented data clustering for a large number of news us- ing K-Means method on a distributed computing system. Designed systems has two subsystems, data preprocess that is used to look for the features of the news and data clustering system that serves to di- vide the data into groups. The functions in the system created with MapReduce programming model and run on Hadoop based computer cluster. From the tests, the accuracy of the results obtained by this system is over 83% with the data collected from news website with speci�ed category. Faster processing time about 20% can be achieved with increasing the number of slave node 25%-50%.

Item Type: Thesis (Undergraduate)
Additional Information: RSE 006.312 Hik p
Uncontrolled Keywords: big data, distributed computation, hadoop, k-means clustering, text mining
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5105 Data Transmission Systems
Divisions: Faculty of Industrial Technology > Electrical Engineering > (S1) Undergraduate Theses
Depositing User: Users 13 not found.
Date Deposited: 13 Jan 2017 07:30
Last Modified: 26 Dec 2018 08:39
URI: http://repository.its.ac.id/id/eprint/1533

Actions (login required)

View Item View Item