Reidentifikasi Orang pada Data Multi-modal menggunakan Classifier Swin Transformer

Husnah, Indiana Namaul (2024) Reidentifikasi Orang pada Data Multi-modal menggunakan Classifier Swin Transformer. Other thesis, Institut Teknologi Sepuluh Nopember.

[thumbnail of 5024201061-Undergraduate_Thesis.pdf] Text
5024201061-Undergraduate_Thesis.pdf - Accepted Version
Restricted to Repository staff only until 1 October 2026.

Download (4MB) | Request a copy

Abstract

Di zaman sekarang ini, dunia semakin berbahaya dengan banyaknya tidak kejahatan. Biasanya untuk mengantisipasi tindak kejahatan, diperlukan adanya kamera pemantau di lingkungan yang berpotensi adanya tindak kejahatan, dengan itu kita dengan mudah memantau seseorang. Namun, dalam melakukan pemantauan manual, kita tidak bisa terus-menerus untuk sigap dalam mencegah tindak kejahatan. Selain itu, terkadang sulit juga untuk memantau objek yang terhalang kondisi seperti pada malam hari, kabut, atau asap. Camera memiliki beberapa citra untuk melihat dalam keadaan sulit, yaitu terdapat citra RGB (Red, Green, Blue), Near Infrared, dan Thermal Infrared yang dapat memudahkan penglihatan camera. Mengikuti perkembangan pada era sekarang Artificial Intelligence menjadi bidang yang selalu dibutuhkan serta sedang banyak dilakukan penelitian mengenai bidang ini, yang mana salah satunya terdapat perkembangan terbaru mengenai Visi Komputer yang dinamakan Transformer. Transformer berkembang menjadi Vision Transformer yang memecahkan bagian gambar seperti yang dilakukan pada NLP, selanjutnya penelitian selanjutnya, yaitu Swin Transformer memiliki arsitektur hirarkis Vision Transformer menggunakan Shifted Windows yang memiliki kualitas baik dalam tugas visi komputer. Penelitian ini menggunakan dataset RGBNT201, dataset yang berisi tiga modalitas (RGB, NI, TI). Kemudian diimplementasikan ke model Swin Transformer. Pengimplementasian menggunakan percobaan berbagai variasi loss function (Circle Loss, Contrastive Loss, Triplet Loss) dan ditambahkan dengan augmentasi otomatis membuatnya mendapatkan hasil yang lebih baik daripada metode sebelumnya, yaitu pada Swin V1 AutoAugment model D didapatkan mAP 32.79% dan rank@1 sebesar 63%, sedangkan pada metode sebelumnya dengan PFNet, didapatkan mAP 31.76% dan rank@1 sebesar 54.59%. Dengan hasil penelitian yang telah dilakukan, harapannya dapat menjadi referensi untuk perkembangan teknologi visi komputer selanjutnya, serta bisa menjadi metode yang bisa diimplementasikan untuk mencegah tindak kejahatan di dunia.
====================================================================================================================================
In today’s world, it is becoming increasingly dangerous with the rising number of crimes. Typically, to anticipate criminal acts, surveillance cameras are required in environments with potential criminal activity, making it easy to monitor individuals. However, in manual surveillance, we cannot always be vigilant in preventing crimes. Additionally, it is sometimes challenging to monitor objects obstructed by conditions such as nighttime, fog, or smoke. Cameras have several imaging modes to see in difficult conditions, there are RGB (Red, Green, Blue), Near Infrared, and Thermal Infrared images, which facilitate camera vision. Following current developments, Artificial Intelligence has been becoming a field that is always needed and is being extensively researched. One of the latest advancements in this field is Computer Vision, specifically the Transformer model, which has significant potential in Natural Language Processing tasks. This Transformer has evolved into the Vision Transformer, which breaks down images into a patch similar to NLP tasks. Further research led to the Swin Transformer, which has a hierarchical architecture using ShiftedWindows, demonstrating high quality performance in computer vision tasks. This research utilizes the RGBNT201 dataset, which contains multimodality (RGB, NI, TI), and implements them in the Swin Transformer model. The implementation includes experiments with various loss functions (Circle Loss, Contrastive Loss, Triplet Loss) and automatic augmentation, achieving better results than previous methods. In the Swin V1 AutoAugment model D, mAP of 32.79% and rank@1 of 63% were obtained, while the previous method with PFNet achieved an mAP of 31.76% and rank@1 of 54.59%. With the research results, it is hoped that this can serve as a reference for future developments in computer vision technology and can be implemented as a method to prevent crime globally.

Item Type: Thesis (Other)
Uncontrolled Keywords: Visi Komputer, Deep Learning, Vision Transformer; Computer Vision, Deep Learning, Vision Transformer
Subjects: T Technology > T Technology (General) > T58.5 Information technology. IT--Auditing
T Technology > T Technology (General) > T58.6 Management information systems
T Technology > T Technology (General) > T58.8 Productivity. Efficiency
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Computer Engineering > 90243-(S1) Undergraduate Thesis
Depositing User: Indiana Namaul Husnah
Date Deposited: 29 Jul 2024 03:27
Last Modified: 29 Jul 2024 03:27
URI: http://repository.its.ac.id/id/eprint/109264

Actions (login required)

View Item View Item