Clustering and Visualizing Surabaya Citizen Aspirations by Using Text Mining. Case Study: Media Center Surabaya

Jannah, Sa'idah Zahrotul (2018) Clustering and Visualizing Surabaya Citizen Aspirations by Using Text Mining. Case Study: Media Center Surabaya. Masters thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 06211650010042-Master_Thesis.pdf]
06211650010042-Master_Thesis.pdf - Accepted Version

Download (2MB) | Preview


This research aims to identify and visualize the topics of citizen opinion about Surabaya City, Indonesia. Data used in this research is Surabaya citizen opinion taken from Media Center Surabaya. The topics were obtained by using clustering method. The pre-processing data, by cleaning the noise; i.e. basic operations and cleaning, stemming, and feature extraction, is primarily assigned to reach the goal. The optimum number of clusters was determined by using the K-Means clustering by calculating the Silhouette value and Calinski-Harabasz Index (CHI) of 2 until 18 clusters. The most optimum clusters were determined by considering the highest silhouette value and CHI. This research compared four options of pre-processing data. They are pre-processing with basic cleaning and operations; stemming, basic cleaning and operations; LDA, basic cleaning and operations; and stemming, LDA, and basic cleaning and operations. The result showed that pre-processing by using LDA as feature extraction performs the best result. Feature extraction improves the cluster result but stemming process seems to give no significant difference. Additionally, this proposed method offers 15 clusters as the optimum number of clusters which were most mentioned topics by Surabaya citizen. Furthermore, the clusters were visualized by using word clouds to highlight the more frequent appeared words. They are government service, trash, ID card, illegal parking area, government program and information., streetlights, street, computer training (BLC) by Surabaya government, potholes, administration letter, media center, online service for administration, education, clean water distribution, service hour by government. The result attempts the information for the Surabaya government of which sector that citizen most concerned about which mostly related to the street problems. For instances traffic jam, road construction, illegal parking area, street lights, and potholes. Moreover, the result encourages collaboration between public and government to concern and solve those problems.

Item Type: Thesis (Masters)
Additional Information: RTSt 519.53 Jan c-1 2018
Uncontrolled Keywords: text clustering; topic identification; public opinion; LDA; K-Means
Subjects: Q Science > QA Mathematics > QA278.55 Cluster analysis
Q Science > QA Mathematics > QA76.9.D343 Data mining. Querying (Computer science)
Q Science > QA Mathematics > QA278 Cluster Analysis. Multivariate analysis. Correspondence analysis (Statistics)
Divisions: Faculty of Mathematics, Computation, and Data Science > Statistics > 49101-(S2) Master Thesis
Depositing User: Sa'idah Zahrotul Jannah
Date Deposited: 02 Aug 2021 23:36
Last Modified: 02 Aug 2021 23:36

Actions (login required)

View Item View Item