Putra, Oddy Virgantara (2025) Self-Supervised Learning Framework for Noise-Resilient 3D Point Cloud Feature Representation. Doctoral thesis, Institut Teknologi Sepuluh Nopember.
Text
7022221024-Dissertation.pdf - Accepted Version Restricted to Repository staff only until 1 April 2027. Download (4MB) | Request a copy |
Abstract
Object recognition in LiDAR-based point cloud data encounters significant challenges, including noise, clutter, irregular data structure, and complicated tasks such as segmentation, classification, and detection. Addressing these challenges requires a robust solution that enhances both data quality and feature representation. In response, we propose an integrated framework that combines denoising and self-supervised learning to improve point cloud recognition. The denoising module, comprising ScoreNet and the Guided Filter, effectively mitigates noise by isolating valuable information and refining critical details, creating a clearer foundation for subsequent processing. This refined data is then fed into a classifier based on a modified GDANet architecture, leveraging depthwise overparameterized convolution (DOConv) to capture complex features critical for accurate classification. To further enhance representation learning, we incorporate AdaCrossNet, a self-supervised learning framework designed to alleviate the manual annotation burden through dynamic intra-modal and cross-modal contrastive learning. AdaCrossNet jointly maximizes representation alignment between 3D point clouds and corresponding 2D images in a shared latent space, adaptively balancing intra-modal and cross-modal contributions with a dynamic weight adjustment mechanism. Stabilized through an exponentially weighted moving average (EWMA), this mechanism enables adaptive learning during training and ensures robust, transferable feature representations. Our integrated framework demonstrates state-of-the-art performance across benchmark datasets. The denoiser achieves a Hausdorff Distance score of 0.177, while the classifier attains 90.7% and 96.7% accuracy on ModelNet40-C and the Human Pose Dataset, respectively. AdaCrossNet further enhances results with 91.4% accuracy on ModelNet40, a mIoU of 85.1% on ShapeNetPart for segmentation tasks, and 82.1% accuracy on the ScanObjectNN dataset with the DGCNN backbone. Our denoising and self-supervised learning components offer a robust solution that improves data quality and representation generalizability across multiple-point cloud recognition tasks.
==================================================================================================================================
Pengenalan objek dalam data point cloud berbasis LiDAR menghadapi tantangan yang signifikan, termasuk noise, clutter, struktur data yang tidak beraturan, dan kasus yang rumit seperti segmentasi, klasifikasi, dan deteksi. Untuk mengatasi tantangan-tantangan ini, diperlukan solusi yang kuat yang dapat meningkatkan kualitas data dan representasi fitur. Di penelitian ini, kami mengusulkan kerangka kerja terintegrasi yang menggabungkan denoising dan pembelajaran mandiri untuk meningkatkan pengenalan point cloud. Modul denoising, yang terdiri dari ScoreNet dan Guided Filter, secara efektif mengurangi noise dengan mengisolasi informasi yang berharga dan menyempurnakan detil penting, sehingga menciptakan fondasi yang lebih jelas untuk pemrosesan selanjutnya. Data yang telah disempurnakan ini kemudian dimasukkan ke dalam pengklasifikasi berdasarkan arsitektur GDANet yang dimodifikasi, dengan memanfaatkan konvolusi overparameterized (DOConv) untuk menangkap fitur-fitur kompleks yang sangat penting untuk klasifikasi yang akurat. Untuk meningkatkan pembelajaran representasi, kami menggabungkan AdaCrossNet, sebuah kerangka kerja self-supervised learning yang dirancang untuk meringankan beban anotasi manual melalui contrastive learning intramodal dan cross-modal yang dinamis. AdaCrossNet secara bersama-sama memaksimalkan penyelarasan representasi antara point cloud 3D dan gambar 2D yang sesuai dalam ruang spasial, secara adaptif menyeimbangkan kontribusi intra-modal dan cross-modal dengan mekanisme penyesuaian bobot dinamis. Distabilkan melalui perhitungan exponential weight moving average (EWMA), mekanisme ini memungkinkan pembelajaran adaptif selama pelatihan dan memastikan representasi fitur yang kuat dan dapat ditransfer. Kerangka kerja terintegrasi kami menunjukkan kinerja yang bagus di seluruh dataset benchmark. Denoiser mencapai skor Hausdorff Distance sebesar 0,177, sementara pengklasifikasi mencapai akurasi 90,7% dan 96,7% pada ModelNet40 dan Human Pose Dataset. AdaCrossNet lebih lanjut meningkatkan hasil dengan akurasi 91,4% pada ModelNet40, mIoU 85,1% pada ShapeNetPart untuk tugas segmentasi, dan 82,1% akurasi pada dataset ScanObjectNN dengan backbone DGCNN. Gabungan kedua komponen denoising dan self-supervised learning menawarkan solusi tangguh yang meningkatkan kualitas data dan generalisasi representasi di seluruh pengenalan point cloud.
Item Type: | Thesis (Doctoral) |
---|---|
Uncontrolled Keywords: | Adaptive Weighting, Contrastive Learning, Deep Learning, Point Cloud Denoising, Point Cloud Understanding, Self-supervised Learning |
Subjects: | Q Science > Q Science (General) > Q325.5 Machine learning. Support vector machines. Q Science > QA Mathematics > QA336 Artificial Intelligence Q Science > QA Mathematics > QA76.87 Neural networks (Computer Science) |
Divisions: | Faculty of Intelligent Electrical and Informatics Technology (ELECTICS) > Electrical Engineering > 20001-(S3) PhD Thesis |
Depositing User: | ODDY VIRGANTARA PUTRA |
Date Deposited: | 20 Jan 2025 06:44 |
Last Modified: | 20 Jan 2025 06:44 |
URI: | http://repository.its.ac.id/id/eprint/116446 |
Actions (login required)
View Item |