Utomo, Erlangga Wahyu (2025) News Classification Using Ensemble Learning Approach. Other thesis, Institut Teknologi Sepuluh Nopember.
![]() |
Text
5025201118-Undergraduate_Thesis.pdf Restricted to Repository staff only until 1 April 2027. Download (6MB) | Request a copy |
Abstract
The amount of information published online every day can be overwhelming, making it difficult to effectively manage and categorize. This research addresses this problem by combining multiple machine learning and deep learning models in a method known as ensemble learning. Using a set of Huffpost News from Kaggle, the study integrates traditional models, such as logistics regression, random forests and Support for Vector Machines (SVM) with advanced approaches deep learning, such as Long Short Term Memory (LSTM), Bidirectional Long Short Term Memory (BiLSTM) with a mechanism for attention, and the introduction of layers. The data preparation process included cleaning the text, removing unnecessary characters and the conversion of text into numerical forms using Word2Vec embedding field These features were then fed into an ensemble model that combined predictions from different classifiers using a method called soft voting, improving the overall accuracy and reliability of the classification process. To confirm the results, the model was tested using various departments of learning data and test data (80-20 reports). The results indicate that the ensemble model achieved an overall accuracy of 60.19%, outperforming individual models. Among the individual models, BiLSTM with Attention achieved the highest accuracy of 63.9%, followed by LSTM at 54.7%, Random Forest at 48.3%, Logistic Regression at 58.2%, and SVM at 57.5%. Among the individual models, BiLSTM with Attention achieved the best performance, demonstrating its effectiveness in understanding sentence structure and capturing complex patterns in news content. Even though it differs not so much from the individual model, the ensemble still has the best accuracy because it combines all the individual models and combines preprocessing data between machine learning and deep learning, However, the training process for the ensemble model required significant computational resources, taking 10 days to complete.
Item Type: | Thesis (Other) |
---|---|
Uncontrolled Keywords: | News Categorization, Ensemble Learning, Text Classification, Machine Learning, Deep Learning, Huffpost. |
Subjects: | Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines. T Technology > T Technology (General) > T57.5 Data Processing T Technology > T Technology (General) > T57.8 Nonlinear programming. Support vector machine. Wavelets. Hidden Markov models. |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
Depositing User: | Erlangga Wahyu Utomo |
Date Deposited: | 05 Feb 2025 01:45 |
Last Modified: | 05 Feb 2025 01:45 |
URI: | http://repository.its.ac.id/id/eprint/117829 |
Actions (login required)
![]() |
View Item |