TY - JOUR
T1 - Fine-tuning large language models for effective nutrition support in residential aged care
T2 - a domain expertise approach
AU - Alkhalaf, Mohammad
AU - Vithanage, Dinithi
AU - Shen, Jun
AU - Chang, Hui Chen
AU - Deng, Chao
AU - Yu, Ping
PY - 2025/10
Y1 - 2025/10
N2 - Background: Malnutrition is a serious health concern among older adults in residential aged care (RAC), and timely identification is critical for effective intervention. Recent advancements in transformer-based large language models (LLMs), such as RoBERTa, provide context-aware embeddings that improve predictive performance in clinical tasks. Fine-tuning these models on domain-specific corpora, like nursing progress notes, can further enhance their applicability in healthcare. Methodology: We developed a RAC domain-specific LLM by training RoBERTa on 500,000 nursing progress notes from RAC electronic health records (EHRs). The model’s embeddings were used for two downstream tasks: malnutrition note identification and malnutrition prediction. Long sequences were truncated and processed in segments of up to 1536 tokens to fit RoBERTa’s 512-token input limit. Performance was compared against Bag of Words, GloVe, baseline RoBERTa, BlueBERT, ClinicalBERT, BioClinicalBERT, and PubMed models. Results: Using 5-fold cross-validation, the RAC domain-specific LLM outperformed other models. For malnutrition note identification, it achieved an F1-score of 0.966, and for malnutrition prediction, it achieved an F1-score of 0.687. Conclusions: This approach demonstrates the feasibility of developing specialised LLMs for identifying and predicting malnutrition among older adults in RAC. Future work includes further optimisation of prediction performance and integration with clinical workflows to support early intervention.
AB - Background: Malnutrition is a serious health concern among older adults in residential aged care (RAC), and timely identification is critical for effective intervention. Recent advancements in transformer-based large language models (LLMs), such as RoBERTa, provide context-aware embeddings that improve predictive performance in clinical tasks. Fine-tuning these models on domain-specific corpora, like nursing progress notes, can further enhance their applicability in healthcare. Methodology: We developed a RAC domain-specific LLM by training RoBERTa on 500,000 nursing progress notes from RAC electronic health records (EHRs). The model’s embeddings were used for two downstream tasks: malnutrition note identification and malnutrition prediction. Long sequences were truncated and processed in segments of up to 1536 tokens to fit RoBERTa’s 512-token input limit. Performance was compared against Bag of Words, GloVe, baseline RoBERTa, BlueBERT, ClinicalBERT, BioClinicalBERT, and PubMed models. Results: Using 5-fold cross-validation, the RAC domain-specific LLM outperformed other models. For malnutrition note identification, it achieved an F1-score of 0.966, and for malnutrition prediction, it achieved an F1-score of 0.687. Conclusions: This approach demonstrates the feasibility of developing specialised LLMs for identifying and predicting malnutrition among older adults in RAC. Future work includes further optimisation of prediction performance and integration with clinical workflows to support early intervention.
KW - domain-specific fine-tuning
KW - large language model
KW - malnutrition
KW - nursing notes
KW - prediction
KW - RoBERTa
KW - unstructured EHR
UR - http://www.scopus.com/inward/record.url?scp=105020187136&partnerID=8YFLogxK
U2 - 10.3390/healthcare13202614
DO - 10.3390/healthcare13202614
M3 - Article
AN - SCOPUS:105020187136
SN - 2227-9032
VL - 13
JO - Healthcare (Switzerland)
JF - Healthcare (Switzerland)
IS - 20
M1 - 2614
ER -