TY - CHAP
T1 - Classification of kinetic-related injury in hospital triage data using NLP
AU - Shyam, Midhun
AU - Basilakis, Jim
AU - Luken, Kieran
AU - Thomas, Steven
AU - Crozier, John
AU - Middleton, Paul M.
AU - Wang, X. Rosalind
PY - 2026
Y1 - 2026
N2 - Triage notes, created at the start of a patient’s hospital visit, contain a wealth of information that can help medical staff and researchers understand Emergency Department patient epidemiology and the degree of time-dependent illness or injury. Unfortunately, applying modern Natural Language Processing and Machine Learning techniques to analyse triage data faces some challenges: Firstly, hospital data contains highly sensitive information that is subject to privacy regulation thus need to be analysed on site; Secondly, most hospitals and medical facilities lack the necessary hardware to fine-tune a Large Language Model (LLM), much less training one from scratch; Lastly, to identify the records of interest, expert inputs are needed to manually label the datasets, which can be time-consuming and costly. We present in this paper a pipeline that enables the classification of triage data using LLM and limited compute resources. We first fine-tuned a pre-trained LLM with a classifier using a small (2k) open sourced dataset on a GPU; and then further fine-tuned the model with a hospital specific dataset of 1000 samples on a CPU. We demonstrated that by carefully curating the datasets and leveraging existing models and open sourced data, we can successfully classify triage data with limited compute resources.
AB - Triage notes, created at the start of a patient’s hospital visit, contain a wealth of information that can help medical staff and researchers understand Emergency Department patient epidemiology and the degree of time-dependent illness or injury. Unfortunately, applying modern Natural Language Processing and Machine Learning techniques to analyse triage data faces some challenges: Firstly, hospital data contains highly sensitive information that is subject to privacy regulation thus need to be analysed on site; Secondly, most hospitals and medical facilities lack the necessary hardware to fine-tune a Large Language Model (LLM), much less training one from scratch; Lastly, to identify the records of interest, expert inputs are needed to manually label the datasets, which can be time-consuming and costly. We present in this paper a pipeline that enables the classification of triage data using LLM and limited compute resources. We first fine-tuned a pre-trained LLM with a classifier using a small (2k) open sourced dataset on a GPU; and then further fine-tuned the model with a hospital specific dataset of 1000 samples on a CPU. We demonstrated that by carefully curating the datasets and leveraging existing models and open sourced data, we can successfully classify triage data with limited compute resources.
KW - Bio-Clinical BERT
KW - Classification
KW - Clinical Notes
KW - Electronic Health Record
KW - NLP
KW - Triage Notes
UR - https://www.scopus.com/pages/publications/105020663501
UR - https://go.openathens.net/redirector/westernsydney.edu.au?url=https://doi.org/10.1007/978-981-95-3459-3_16
U2 - 10.1007/978-981-95-3459-3_16
DO - 10.1007/978-981-95-3459-3_16
M3 - Chapter
AN - SCOPUS:105020663501
SN - 9789819534586
T3 - Lecture Notes in Computer Science
SP - 209
EP - 216
BT - Advanced Data Mining and Applications: 21st International Conference, ADMA 2025, Kyoto, Japan, October 22-24, 2025, Proceedings, Part III
A2 - Yoshikawa, Masatoshi
A2 - Meng, Xiaofeng
A2 - Cao, Yang
A2 - Xiao, Chuan
A2 - Chen, Weitong
A2 - Wang, Yanda
PB - Springer
CY - Singapore
T2 - 21st International Conference on Advanced Data Mining and Applications, ADMA 2025
Y2 - 22 October 2025 through 24 October 2025
ER -