Papers by Shikhar Shikhar

Multi-sensor fusion has become a cornerstone of 3D perception in autonomous driving systems. Howe... more Multi-sensor fusion has become a cornerstone of 3D perception in autonomous driving systems. However, existing
fusion frameworks commit to a fixed fusion stage—early, mid, or late—regardless of the runtime quality and semantic
consistency of incoming sensor streams. This rigid design fails under real-world adversarial conditions such as camera lens
blur, LiDAR point-cloud dropout in heavy rain, or radar ghosting in urban canyons. In this paper, we present AdapFuse-SD,
an Adaptive Fusion Stage Selection framework driven by a Semantic Disagreement Score (SDS) computed online between
modality-specific feature representations. AdapFuse-SD dynamically routes each inference step to the most appropriate
fusion stage, selecting early fusion when modalities agree at the raw feature level, mid-level fusion when intermediate BEV
features are aligned, and late fusion when per-sensor outputs are more reliable than fused representations. We evaluate
AdapFuse-SD on the nuScenes benchmark and the KITTI-Rain synthetic degradation suite. Our approach achieves a mean
Average Precision (mAP) of 72.8% and a NuScenes Detection Score (NDS) of 75.6%, outperforming the best fixed-stage
baseline by 3.7 mAP points while reducing per-frame inference latency by 8.3% through selective computation. These results
demonstrate that semantic-consistency-aware dynamic fusion is a practical and principled solution to the brittle fixed-stage
fusion paradigm.
Uploads
Papers by Shikhar Shikhar
fusion frameworks commit to a fixed fusion stage—early, mid, or late—regardless of the runtime quality and semantic
consistency of incoming sensor streams. This rigid design fails under real-world adversarial conditions such as camera lens
blur, LiDAR point-cloud dropout in heavy rain, or radar ghosting in urban canyons. In this paper, we present AdapFuse-SD,
an Adaptive Fusion Stage Selection framework driven by a Semantic Disagreement Score (SDS) computed online between
modality-specific feature representations. AdapFuse-SD dynamically routes each inference step to the most appropriate
fusion stage, selecting early fusion when modalities agree at the raw feature level, mid-level fusion when intermediate BEV
features are aligned, and late fusion when per-sensor outputs are more reliable than fused representations. We evaluate
AdapFuse-SD on the nuScenes benchmark and the KITTI-Rain synthetic degradation suite. Our approach achieves a mean
Average Precision (mAP) of 72.8% and a NuScenes Detection Score (NDS) of 75.6%, outperforming the best fixed-stage
baseline by 3.7 mAP points while reducing per-frame inference latency by 8.3% through selective computation. These results
demonstrate that semantic-consistency-aware dynamic fusion is a practical and principled solution to the brittle fixed-stage
fusion paradigm.