CMDistill: Cross-Modal Distillation Framework for AAV Image Object Detection

With the increasing intelligence of autonomous aerial vehicles carrying diverse mission payloads, the target detection domain consists mainly of single-modal and multimodal approaches with diverse and complex combinations.It is challenging to achieve an optimal Shampoo tradeoff between expensive and complex models of multimodal detectors, which are difficult to deploy directly on autonomous aerial vehicles (AAVs), and the limited detection accuracy of single-modal detectors.To overcome this limitation, we developed a cross-modal target detector called CMDistill.Specifically, we designed an effective distillation loss method based on three components.First, to reduce differences in feature knowledge across modal interlayers, we designed a Pearson correlation coefficient to constrain negative knowledge.

Second, we modeled the relational cues between features by computing the affinity matrices of the deeper semantic features of the teacher–student model to convey relational knowledge more accurately.Finally, the target bounding boxes and classification information predicted by the output of the teacher were passed to the student.Experimental results on the aerial vehicle detection dataset revealed that CMDistill achieved an optimal performance with an average accuracy of 74% on the RGB-only Gun Blue target detection task using fewer computational resources to achieve a detection performance comparable with that of multimodal approaches.

Leave a Reply

Your email address will not be published. Required fields are marked *