Skip to main navigation Skip to search Skip to main content

Hierarchical fusion network for joint segmentation and VQA in road pavement inspection

  • Zhejiang Sci-Tech University
  • University of Melbourne
  • Curtin University

Research output: Contribution to journalArticlepeer-review

Abstract

Conventional road pavement inspection methods typically rely on a single data modality, attempting to address defect segmentation and semantic assessment separately. Moreover, existing methods struggle with accurate crack delineation under complex field conditions. To address these challenges, this paper proposes a unified architecture, named HFSV-Net (Hierarchical Fusion Network for Joint Segmentation and Visual Question Answering). The proposed method adopts a multi-scale, multi-stage feature fusion strategy for cross-model feature representations. A Feature Pyramid Network-style single-head segmentation decoder equipped with a stripe attention mechanism is introduced to further enhance segmentation performance. Experiments on three benchmark datasets demonstrate that HFSV-Net outperforms the best competing methods by 2.32%, 1.10%, and 2.71% in mIoU, respectively. Ablation studies with feature visualization analysis further validate the effectiveness of the proposed modules. Overall, the work establishes a unified multimodal fusion paradigm for joint segmentation and VQA in road pavement inspection, achieving superior crack delineation performance under challenging conditions.

Original languageEnglish
Article number106829
Number of pages25
JournalAutomation in Construction
Volume184
DOIs
Publication statusPublished - Apr 2026

Keywords

  • Adaptive fusion
  • Crack segmentation
  • Deep feature fusion
  • Multi-scale feature fusion
  • Multimodal fusion
  • Multitask

Fingerprint

Dive into the research topics of 'Hierarchical fusion network for joint segmentation and VQA in road pavement inspection'. Together they form a unique fingerprint.

Cite this