Author
Listed:
- Zheng Zhang
(School of Information Science and Technology, North China University of Technology, Beijing 100144, China)
- Ruyu Xu
(School of Information Science and Technology, North China University of Technology, Beijing 100144, China)
- Qing Tian
(School of Information Science and Technology, North China University of Technology, Beijing 100144, China)
Abstract
In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging due to the different coordinate systems of various sensors and the significant difference in the density of the collected data. It is necessary to fully consider the consistency and complementarity of multi-modal information, make up for the gap between multi-source data density, and achieve the joint interactive processing of multi-source information. Therefore, this paper is based on Transformer to improve a new multi-modal fusion model called PIDFusion for 3D object detection. Firstly, the method uses the results of 2D instance segmentation to generate dense 3D virtual points to enhance the original sparse 3D point clouds. This optimizes the issue that the nearest Euclidean distance in the 2D image space cannot ensure the nearest in the 3D space. Secondly, a new cross-modal fusion architecture is designed to maintain individual per-modality features to take advantage of their unique characteristics during 3D object detection. Finally, an instance-level fusion module is proposed to enhance semantic consistency through cross-modal feature interaction. Experiments show that PIDFusion is far ahead of existing 3D object detection methods, especially for small and long-range objects, with 70.8 mAP and 73.5 NDS on the nuScenes test set.
Suggested Citation
Zheng Zhang & Ruyu Xu & Qing Tian, 2023.
"PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection,"
Mathematics, MDPI, vol. 11(20), pages 1-15, October.
Handle:
RePEc:gam:jmathe:v:11:y:2023:i:20:p:4277-:d:1259245
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:20:p:4277-:d:1259245. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.