Implementasi Arsitektur Encoder-Decoder Berbasis RemoteCLIP dan GPT-2 untuk Pembangkitan Teks Citra Penginderaan Jauh Kawasan Perkotaan Hong Kong

Kurniawan, Richo Yudha (2025) Implementasi Arsitektur Encoder-Decoder Berbasis RemoteCLIP dan GPT-2 untuk Pembangkitan Teks Citra Penginderaan Jauh Kawasan Perkotaan Hong Kong. Project Report. [s.n.], [s.l.]. (Unpublished)

[thumbnail of 5025221242-Project_Report.pdf] Text
5025221242-Project_Report.pdf - Accepted Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Kebutuhan mengekstraksi informasi dari citra penginderaan jauh secara cepat dan akurat menjadi tantangan utama dalam bidang image captioning. Analisis citra penginderaan jauh secara manual seringkali tidak efisien dan memakan banyak waktu. Oleh karena itu, Remote Sensing Image Captioning (RSIC) menjadi solusi untuk mendeskripsikan data citra penginderaan jauh menjadi teks yang informatif secara cepat dan akurat. Kerja praktik ini berfokus pada implementasi arsitektur encoder-decoder berbasis Remote Sensing Contrastive Language-Image Pretraining (RemoteCLIP) sebagai encoder dan Generative Pre-trained Transformer 2 (GPT-2) sebagai decoder. Proses implementasi mencakup pelatihan model pada dataset Satellite Image Caption Generation dan inferensi untuk pembangkitan teks pada dataset citra penginderaan jauh kawasan perkotaan Hong Kong. Analisis kinerja dilakukan menggunakan berbagai metrik evaluasi, di antaranya BLEU, CIDEr, METEOR, dan ROUGE. Hasil pengujian menunjukkan bahwa arsitektur RemoteCLIP-GPT2 merupakan arsitektur yang cukup kuat dan efektif untuk tugas RSIC.
=====================================================================================================================================
The need to extract information from remote sensing imagery quickly and accurately is a primary challenge in the field of image captioning. Manual analysis of remote sensing data is often inefficient and time-consuming. Consequently, Remote Sensing Image Captioning (RSIC) has emerged as a solution to transform remote sensing imagery into informative text both rapidly and precisely. This internship project focuses on the implementation of an encoder-decoder architecture utilizing Remote Sensing Contrastive Language-Image Pretraining (RemoteCLIP) as the encoder and Generative Pre-trained Transformer 2 (GPT-2) as the decoder. The implementation process involves model training on the Satellite Image Caption Generation dataset and inference for text generation on a remote sensing imagery dataset of Hong Kong urban areas. Performance analysis was conducted using various evaluation metrics, including BLEU, CIDEr, METEOR, and ROUGE. The results demonstrate that the RemoteCLIP-GPT2 architecture is a robust and effective framework for RSIC tasks.

Item Type: Monograph (Project Report)
Uncontrolled Keywords: RSIC, Remote Sensing Imagery, RemoteCLIP, GPT-2, Encoder-Decoder, RSIC, Citra Penginderaan Jauh, RemoteCLIP, GPT-2, Encoder-Decoder
Subjects: T Technology > T Technology (General) > T57.5 Data Processing
T Technology > T Technology (General) > T59.7 Human-machine systems.
Divisions: Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Informatics Engineering > 55201-(S1) Undergraduate Thesis
Depositing User: Richo Yudha Kurniawan
Date Deposited: 12 Jan 2026 02:20
Last Modified: 12 Jan 2026 02:20
URI: http://repository.its.ac.id/id/eprint/129311

Actions (login required)

View Item View Item