TY - JOUR
T1 - Extracting social determinants of health from dental clinical notes
AU - Pethani, Farhana
AU - Chapman, Alec
AU - Conway, Mike
AU - Dai, Xiang
AU - Bishay, Demiana
AU - Choh, Victor
AU - He, Alexander
AU - Lim, Su Elle
AU - Ng, Huey Ying
AU - Mahony, Tanya
AU - Yaacoub, Albert
AU - Karimi, Sarvnaz
AU - Spallek, Heiko
AU - Dunn, Adam G.
PY - 2025/8/1
Y1 - 2025/8/1
N2 - Objectives In dentistry, social determinants of health (SDoH) are potentially recorded in the clinical notes of electronic dental records. The objective of this study was to examine the availability of SDoH data in dental clinical notes and evaluate natural language processing methods to extract SDoH from dental clinical notes. Methods A set of 1,000 dental clinical notes was sampled from a dataset of 105,311 patient visits to a dental clinic and manually annotated for information pertaining to sugar, tobacco, alcohol, methamphetamine, housing, and employment. Annotations included temporality, dose, type, duration, and frequency where appropriate. Experiments were to compare extraction using fine-tuned pretrained language models (PLMs) with a rule-based approach. Performance was measured by F1-score. Results For identifying SDoH, the best-performing PLM method produced F1-scores of 0.75 (sugar), 0.69 (tobacco), 0.67 (alcohol), 0.42 (housing), and 0 (employment). The rule-based method produced F1-scores of 0.70 (sugar), 0.69 (tobacco), 0.53 (alcohol), 0.44 (housing), and 0 (employment). The overall difference between PLMs and rule-based methods was F1-score of 0.04 (95% confidence interval −0.01, 0.09). SDoH were relatively rare in dental clinical notes, from sugar (9.1%), tobacco (3.9%), alcohol (1.2%), housing (1.2%), employment (0.2%), and methamphetamine use (0%). Conclusion The main challenge of extracting SDoH information from dental clinical notes was the frequency with which they are recorded, and the brevity and inconsistency where they are recorded. Improved surveillance likely needs new ways to standardize how SDoHs are reported in dental clinical notes.
AB - Objectives In dentistry, social determinants of health (SDoH) are potentially recorded in the clinical notes of electronic dental records. The objective of this study was to examine the availability of SDoH data in dental clinical notes and evaluate natural language processing methods to extract SDoH from dental clinical notes. Methods A set of 1,000 dental clinical notes was sampled from a dataset of 105,311 patient visits to a dental clinic and manually annotated for information pertaining to sugar, tobacco, alcohol, methamphetamine, housing, and employment. Annotations included temporality, dose, type, duration, and frequency where appropriate. Experiments were to compare extraction using fine-tuned pretrained language models (PLMs) with a rule-based approach. Performance was measured by F1-score. Results For identifying SDoH, the best-performing PLM method produced F1-scores of 0.75 (sugar), 0.69 (tobacco), 0.67 (alcohol), 0.42 (housing), and 0 (employment). The rule-based method produced F1-scores of 0.70 (sugar), 0.69 (tobacco), 0.53 (alcohol), 0.44 (housing), and 0 (employment). The overall difference between PLMs and rule-based methods was F1-score of 0.04 (95% confidence interval −0.01, 0.09). SDoH were relatively rare in dental clinical notes, from sugar (9.1%), tobacco (3.9%), alcohol (1.2%), housing (1.2%), employment (0.2%), and methamphetamine use (0%). Conclusion The main challenge of extracting SDoH information from dental clinical notes was the frequency with which they are recorded, and the brevity and inconsistency where they are recorded. Improved surveillance likely needs new ways to standardize how SDoHs are reported in dental clinical notes.
KW - dentistry
KW - electronic dental records
KW - information extraction
KW - natural language processing
KW - social determinants of health
UR - http://www.scopus.com/inward/record.url?scp=105017812739&partnerID=8YFLogxK
U2 - 10.1055/a-2616-9858
DO - 10.1055/a-2616-9858
M3 - Article
C2 - 40398852
AN - SCOPUS:105017812739
SN - 1869-0327
VL - 16
SP - 1281
EP - 1291
JO - Applied Clinical Informatics - ACI
JF - Applied Clinical Informatics - ACI
IS - 4
ER -