Dataset and benchmark for captioning images depicting complex construction activities

  • Zebang Liu
  • , Huan Liu
  • , Chao Mao
  • , Yulu Huang
  • , Jun Wang

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose – This study aims to develop and publicly release a large-scale and diverse dataset that can assist industry professionals in automating safety reports, construction logs, and other project documentation, while providing researchers with reliable data for developing and benchmarking vision–language models in construction-related applications. Design/methodology/approach – An extensive dataset was created by collecting over 13, 000 images depicting complex construction activities across various weather conditions, construction phases, and viewpoints. Each image was annotated with detailed captions, resulting in over 65, 000 descriptions. The dataset’s diversity and richness were statistically analyzed. Additionally, three widely used deep learning-based image captioning models were evaluated on the dataset to assess its applicability and effectiveness. Findings – The dataset demonstrates substantial variety in both image content and descriptive detail, making it a valuable resource for training and benchmarking image captioning models in the construction domain. Experimental results confirm the dataset’s suitability for different model architectures, highlighting its potential to improve automated construction site image understanding. Originality/value – The dataset uses sentence-level annotations instead of single words or short phrases, allowing richer semantic and contextual descriptions of construction scenes. It includes a larger and more diverse set of images that capture both workers and machinery engaged in complex construction activities rather than focusing on a single object type. Safety-related elements such as helmets and reflective vests are also explicitly included to support safety analysis.

Original languageEnglish
Pages (from-to)1-23
Number of pages23
JournalEngineering, Construction and Architectural Management
DOIs
Publication statusE-pub ahead of print (In Press) - 2026

Bibliographical note

Publisher Copyright:
© 2026 Emerald Publishing Limited

Keywords

  • Complex construction activities
  • Computer vision
  • Dataset
  • Deep learning
  • Image captioning

Fingerprint

Dive into the research topics of 'Dataset and benchmark for captioning images depicting complex construction activities'. Together they form a unique fingerprint.

Cite this