Hares, Yosua (2025) Pengembangan Model Computer Vision untuk Generalisasi dan Prediksi Rotasi Arm pada Task Pick-Up Object. Project Report. [s.n.], [s.l.]. (Unpublished)
|
Text
5025221270-Project_Report.pdf - Accepted Version Restricted to Repository staff only Download (1MB) | Request a copy |
Abstract
Perkembangan artificial intelligence (AI) dan foundation models telah meningkatkan kemampuan sistem robotika untuk memahami lingkungan secara visual dan menerjemahkannya menjadi aksi yang lebih presisi. Dalam konteks tersebut, kerja praktik ini berfokus pada eksplorasi dan analisis kemampuan model-model seperti DINOv2, UAD, Gemini Robotics, RoboBrain, serta Robotic Transformer (RT) dalam identifikasi affordance dan representasi dasar trajectory pada tugas manipulasi objek, khususnya pick-up object. Implementasi yang dilakukan mencakup pemanfaatan model pre-trained untuk ekstraksi fitur visual dan prediksi titik interaksi, penyusunan pipeline persepsi yang dapat diintegrasikan dengan modul perencanaan gerak, serta pengumpulan data gestur manusia menggunakan MediaPipe sebagai dasar eksplorasi pemodelan rotasi. Selain itu, dilakukan pula fine-tuning awal pada RoboBrain 1.0 menggunakan LoRA untuk mendukung studi terhadap representasi gerak rotasi pada skenario manipulasi sederhana. Secara keseluruhan, kerja praktik ini memberikan gambaran menyeluruh mengenai potensi pemanfaatan foundation models sebagai fondasi sistem perception-to-action dan menyediakan arah pengembangan untuk integrasi affordance serta trajectory planning pada sistem robotik di masa mendatang.
==================================================================================================================================
The development of artificial intelligence (AI) and foundation models has enhanced the ability of robotic systems to visually understand their environment and translate this understanding into more precise actions. In this context, this internship focuses on exploring and analyzing the capabilities of models such as DINOv2, UAD, Gemini Robotics, RoboBrain, and the Robotic Transformer (RT) in affordance identification and basic trajectory representation for object manipulation tasks, particularly object pick-up. The implementation includes leveraging pre-trained models for visual feature extraction and interaction point prediction, designing a perception pipeline that can be integrated with motion planning modules, and collecting human gesture data using MediaPipe as a basis for exploring rotation modeling. In addition, preliminary fine-tuning of RoboBrain 1.0 using LoRA was conducted to support the study of rotational motion representations in simple manipulation scenarios. Overall, this internship provides a comprehensive overview of the potential use of foundation models as the basis for perception-to-action systems and outlines future development directions for integrating affordance and trajectory planning in robotic systems.
| Item Type: | Monograph (Project Report) |
|---|---|
| Uncontrolled Keywords: | Visi Komputer, Affordance, Perencanaan Trajektori, Model Fondasi, DINOv2, RoboBrain, Robotic Transformer. ==== Computer Vision, Affordance, Trajectory Planning, Foundation Models, DINOv2, RoboBrain, Robotic Transformer. |
| Subjects: | T Technology > T Technology (General) > T57.5 Data Processing T Technology > T Technology (General) > T57.62 Simulation |
| Divisions: | Faculty of Information Technology > Informatics Engineering > 55201-(S1) Undergraduate Thesis |
| Depositing User: | Yosua Hares |
| Date Deposited: | 15 Dec 2025 04:32 |
| Last Modified: | 15 Dec 2025 04:32 |
| URI: | http://repository.its.ac.id/id/eprint/128954 |
Actions (login required)
![]() |
View Item |
