Text Clustering Kategori Tweet pada Akun Resmi PT. Transportasi Jakarta dengan Metode K-Means dan Density-Based Spatial Clustering of Applications with Noise

Rachmat, Gabriella Varitie Sentosa (2019) Text Clustering Kategori Tweet pada Akun Resmi PT. Transportasi Jakarta dengan Metode K-Means dan Density-Based Spatial Clustering of Applications with Noise. Undergraduate thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 06211540000063-Undergraduate_Theses.pdf] Text
06211540000063-Undergraduate_Theses.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2022.

Download (1MB) | Request a copy

Abstract

Seringnya terjadi kemacetan di DKI Jakarta menyebabkan pemerintah provinsi membentuk TransJakarta. Banyak orang hendak mengajukan pertanyaan, keluhan, atau saran kepada TransJakarta melalui Twitter. Guna mempermudah dan mempercepat dalam menanggapi setiap tweet, dilakukan pembentukan kategori tweet yang datanya diperoleh dengan Twitter API. Pertama dilakukan text preprocessing yaitu cleaning, case folding, stemming, normalisasi kata, tokenizing, stopwords removal, dan dilanjutkan melakukan perhitungan serta pembobotan setiap kata menggunakan TF–IDF (Term Frequency–Inverse Document Frequency). Selain itu, diusulkan penggunaan Genetic Algorithm (GA) untuk feature selection. Clustering menggunakan metode K-means dan Density-Based Spatial Clustering of Applications with Noise (DBSCAN) dilakukan untuk menentukan kategori tweet yang divisualisasikan dengan word cloud. Clustering menggunakan DBSCAN dengan feature selection GA adalah metode terbaik karena menghasilkan nilai silhouette coefficient tinggi dengan jumlah noise lebih sedikit dibandingkan tanpa feature selection. Hasil clustering memperoleh empat kategori tweet, yaitu halte/rute bus, fasilitas bus, kebersihan bus, dan konsistensi TransJakarta.
================================================================================================
The frequent occurrences of traffic jams in DKI Jakarta have caused the provincial government to form TransJakarta. Many people want to ask questions, complaints, or suggestions to TransJakarta via Twitter. In order to simplify and speed up in responding the tweets, the categories of tweet were formed which data was obtained using the Twitter API. The text preprocessing was done firstly, namely cleaning, case folding, stemming, word normalization, tokenizing, stopwords removal, and then proceed with calculating and weighting each word using Term Frequency–Inverse Document Frequency (TF–IDF). Beside that, the use of Genetic Algorithm (GA) was also proposed for feature selection. Clustering using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) methods were done to determine the categories of tweet which then visualized using word cloud. Clustering using DBSCAN with GA feature selection is the best method because it produced a high value of silhouette coefficient with less noise compared to without feature selection. The clustering results obtained four categories of tweet, namely bus stop/route, bus facilities, bus cleanliness, and TransJakarta consistency.

Item Type: Thesis (Undergraduate)
Additional Information: RSSt 519.53 Rac t-1 2019
Uncontrolled Keywords: DBSCAN, Genetic Algorithm, K-Means, Silhouette Coefficient, TransJakarta
Subjects: H Social Sciences > HA Statistics
H Social Sciences > HE Transportation and Communications > HE355 Traffic engineering
Q Science > QA Mathematics > QA278.55 Cluster analysis
Q Science > QA Mathematics > QA402.5 Genetic algorithms.
Q Science > QA Mathematics > QA76.9.D343 Data mining. Querying (Computer science)
Q Science > QA Mathematics > QA76.9.I52 Information visualization
Divisions: Faculty of Science and Data Analytics (SCIENTICS) > Statistics > 49201-(S1) Undergraduate Thesis
Depositing User: Rachmat Gabriella Varitie Sentosa
Date Deposited: 29 Dec 2021 05:10
Last Modified: 29 Dec 2021 05:10
URI: http://repository.its.ac.id/id/eprint/61724

Actions (login required)

View Item View Item