A camera-LiDAR Fusion Transformer (CLFT) for Semantic Segmentation in Autonomous Driving

This project proposes a vision-transformer-based network to carry out camera-LiDAR fusion for semantic segmentation applied to autonomous driving. The proposal uses the novel progressive-assemble strategy of vision transformers on a double-direction network and then integrates the results in a cross-fusion strategy over the transformer decoder layers. So far when this page was created, this is the first open source camera-LiDAR-fusion transformer-based network that invokes the camera-plane-projection strategy to process the liDAR point clouds data during the fusion.

The dataset used for validate the network is the Waymo open dataset. The video above visualizes the segmentation results based on different illumination and weather conditions. Please refer to the paper