TY - JOUR
T1 - Leveraging vision-language embeddings for zero-shot learning in histopathology images
AU - Rahaman, Md Mamunur
AU - Millar, Ewan K.A.
AU - Meijering, Erik
PY - 2025
Y1 - 2025
N2 - Zero-shot learning (ZSL) offers tremendous potential for histopathology image analysis, enabling models to generalize to unseen classes without extensive labeled data. Recent vision-language model (VLM) advancements have expanded ZSL capabilities, allowing task performance without task-specific fine-tuning. However, applying VLMs to histopathology presents considerable challenges due to the complexity of histopathological imagery and the nuanced nature of diagnostic tasks. We propose Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE), a novel framework for zero-shot histopathology image classification. MR-PHE mimics pathologists' workflow through multiresolution patch extraction to capture key cellular and tissue features. It introduces a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings, effectively combining local and global contextual information. Additionally, we develop a comprehensive prompt generation and selection framework, enriching class descriptions with domain-specific synonyms and clinically relevant features to enhance semantic understanding. A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings, emphasizing diagnostically important regions during classification. Experimental results demonstrate MR-PHE significantly improves zero-shot classification performance on histopathology datasets, often surpassing fully supervised models, showing its effectiveness and potential to advance computational pathology.
AB - Zero-shot learning (ZSL) offers tremendous potential for histopathology image analysis, enabling models to generalize to unseen classes without extensive labeled data. Recent vision-language model (VLM) advancements have expanded ZSL capabilities, allowing task performance without task-specific fine-tuning. However, applying VLMs to histopathology presents considerable challenges due to the complexity of histopathological imagery and the nuanced nature of diagnostic tasks. We propose Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE), a novel framework for zero-shot histopathology image classification. MR-PHE mimics pathologists' workflow through multiresolution patch extraction to capture key cellular and tissue features. It introduces a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings, effectively combining local and global contextual information. Additionally, we develop a comprehensive prompt generation and selection framework, enriching class descriptions with domain-specific synonyms and clinically relevant features to enhance semantic understanding. A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings, emphasizing diagnostically important regions during classification. Experimental results demonstrate MR-PHE significantly improves zero-shot classification performance on histopathology datasets, often surpassing fully supervised models, showing its effectiveness and potential to advance computational pathology.
KW - Computational Pathology
KW - Histopathology
KW - Hybrid Embedding
KW - Prompt Generation
KW - Vision-Language Models (VLMs)
KW - Zero-Shot Learning
UR - http://www.scopus.com/inward/record.url?scp=105009979565&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2025.3584802
DO - 10.1109/JBHI.2025.3584802
M3 - Article
AN - SCOPUS:105009979565
SN - 2168-2194
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
ER -