Abstract
Purpose – This study aims to develop and publicly release a large-scale and diverse dataset that can assist industry professionals in automating safety reports, construction logs, and other project documentation, while providing researchers with reliable data for developing and benchmarking vision–language models in construction-related applications. Design/methodology/approach – An extensive dataset was created by collecting over 13, 000 images depicting complex construction activities across various weather conditions, construction phases, and viewpoints. Each image was annotated with detailed captions, resulting in over 65, 000 descriptions. The dataset’s diversity and richness were statistically analyzed. Additionally, three widely used deep learning-based image captioning models were evaluated on the dataset to assess its applicability and effectiveness. Findings – The dataset demonstrates substantial variety in both image content and descriptive detail, making it a valuable resource for training and benchmarking image captioning models in the construction domain. Experimental results confirm the dataset’s suitability for different model architectures, highlighting its potential to improve automated construction site image understanding. Originality/value – The dataset uses sentence-level annotations instead of single words or short phrases, allowing richer semantic and contextual descriptions of construction scenes. It includes a larger and more diverse set of images that capture both workers and machinery engaged in complex construction activities rather than focusing on a single object type. Safety-related elements such as helmets and reflective vests are also explicitly included to support safety analysis.
| Original language | English |
|---|---|
| Pages (from-to) | 1-23 |
| Number of pages | 23 |
| Journal | Engineering, Construction and Architectural Management |
| DOIs | |
| Publication status | E-pub ahead of print (In Press) - 2026 |
Bibliographical note
Publisher Copyright:© 2026 Emerald Publishing Limited
Keywords
- Complex construction activities
- Computer vision
- Dataset
- Deep learning
- Image captioning
Fingerprint
Dive into the research topics of 'Dataset and benchmark for captioning images depicting complex construction activities'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver