Hanum, Rakhma Rufaida (2019) Merepresentasikan Makna Kata untuk Metode Klasifikasi Naive Bayes, Random Forest, dan Support Vector Machine dalam Studi Kasus Kemacetan di Surabaya. Undergraduate thesis, Institut Teknologi Sepuluh Nopember.
Preview |
Text
05111540000161-Undergraduate_Thesis.pdf - Accepted Version Download (1MB) | Preview |
Abstract
Twitter merupakan media sosial berbasis microblogging yang memungkinkan pengguna untuk memposting tulisan pendek yang dikenal dengan istilah tweet. Pengguna dapat menuliskan informasi kemacetan. Pada referensi Tugas Akhir sebelumnya, pembobotan kata untuk klasifikasi dilakukan dengan metode TF-IDF (Term Frequency-Inverse Document Frequency).
Dalam tugas akhir ini, dilakukan representasi makna kata ke dalam bentuk vektor menggunakan Word2vec. Pembobotan kata didapatkan dari hasil kedekatan vektor kata dengan vektor kata “macet” hasil dari training Word2vec. Semakin tinggi skor kedekatan, maka semakin mirip makna kedua kata tersebut. Klasifikasi tweet kemacetan di Surabaya dilakukan menggunakan dataset tweet yang berasal dari twitter. Data tersebut diambil dari akun @LMSurabaya dan akun @sits_dishubsby. Data tersebut akan diklasifikasi menggunakan metode klasifikasi Naïve Bayes, Random Forest dan Linear Support Vector Machine (SVM) menggunakan PySpark.
Dari hasil evaluasi didapatkan akurasi terbaik metode Random Forest yaitu 84.74%. Hasil akurasi yang didapat tidak lebih baik dibandingkan dengan TF-IDF sebesar 95.90%.
Kata kunci: kemacetan, klasifikasi, naïve bayes, random forest, svm, twitter, word2vec.
===============================================================================================
Twitter is a microblogging based social media that allows users to post short posts known as tweets. Users can write events around them, for example is traffic information. In the previous Final Project reference, the weighting of words for classification was calculated using the TF-IDF (Term Frequency-Inverse Document Frequency) method.
In this final project, representation of word meanings into vector shapes using Word2vec are performed. Term weighting obtained from the result of proximity of the word vector with the vector of word "macet" obtained from the result of Word2vec training. The higher the proximity score, the more similar the meaning of the two words. The classification of traffic jams in Surabaya is done by using a dataset of tweets originating from twitter.The data is taken from @LMSurabaya and @sits_dishubsby twitter account. The data will be classified using the Naïve Bayes, Random Forest and Linear Support Vector Machine (SVM) classification method using PySpark
After classification using Naive Bayes, Random Forest, and SVM is done, the result show that Random Forest got the best overall accuracy point, 84,74%. The result obtained is not as good as TF-IDF where 95,90% accuracy point obtained.
Keywords: classification, naïve bayes, random forest, traffic, twitter,svm, word2vec.
Item Type: | Thesis (Undergraduate) |
---|---|
Additional Information: | RSIf 006.7 Han m-1 2019 |
Uncontrolled Keywords: | kemacetan, klasifikasi, naïve bayes, random forest, svm, twitter, word2vec. |
Subjects: | T Technology > T Technology (General) > T57.5 Data Processing T Technology > T Technology (General) > T58.5 Information technology. IT--Auditing T Technology > T Technology (General) > T58.6 Management information systems T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK5105.88815 Semantic Web |
Divisions: | Faculty of Information and Communication Technology > Informatics > 55201-(S1) Undergraduate Thesis |
Depositing User: | Rakhma Rufaida Hanum |
Date Deposited: | 10 Jun 2021 07:29 |
Last Modified: | 14 Jun 2022 01:10 |
URI: | http://repository.its.ac.id/id/eprint/60445 |
Actions (login required)
View Item |