Abstract
Conventional road pavement inspection methods typically rely on a single data modality, attempting to address defect segmentation and semantic assessment separately. Moreover, existing methods struggle with accurate crack delineation under complex field conditions. To address these challenges, this paper proposes a unified architecture, named HFSV-Net (Hierarchical Fusion Network for Joint Segmentation and Visual Question Answering). The proposed method adopts a multi-scale, multi-stage feature fusion strategy for cross-model feature representations. A Feature Pyramid Network-style single-head segmentation decoder equipped with a stripe attention mechanism is introduced to further enhance segmentation performance. Experiments on three benchmark datasets demonstrate that HFSV-Net outperforms the best competing methods by 2.32%, 1.10%, and 2.71% in mIoU, respectively. Ablation studies with feature visualization analysis further validate the effectiveness of the proposed modules. Overall, the work establishes a unified multimodal fusion paradigm for joint segmentation and VQA in road pavement inspection, achieving superior crack delineation performance under challenging conditions.
| Original language | English |
|---|---|
| Article number | 106829 |
| Number of pages | 25 |
| Journal | Automation in Construction |
| Volume | 184 |
| DOIs | |
| Publication status | Published - Apr 2026 |
Keywords
- Adaptive fusion
- Crack segmentation
- Deep feature fusion
- Multi-scale feature fusion
- Multimodal fusion
- Multitask
Fingerprint
Dive into the research topics of 'Hierarchical fusion network for joint segmentation and VQA in road pavement inspection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver