Managing data uncertainty in automatic mapping of clinical classification systems

Research output: Chapter in Book / Conference PaperChapterpeer-review

Abstract

Mapping clinical classification systems, like the International Classification of Disease (ICD) across different versions and other external clinical classifications systems, is challenging and often done manually by trained professionals. Among others, variation in the code descriptions to describe the same clinical condition in different versions poses a unique challenge to implementing automated mapping systems. We call this data uncertainty. Existing lexical-based methods attempt to solve this problem by generating alternative terms using synonyms. This work addresses the data uncertainty by learning a probabilistic embedding for each code description using similar terms and paraphrases. A valid code pair must exhibit proximity in the embedding space and have a comparable distribution. Additionally, we propose a new evaluation metric that considers the hierarchical structure of ICD to evaluate the performance of an automated mapping system. We demonstrate the effectiveness of our approach by mapping ICD-9-CM (Clinical Modification) and ICD-10-CM, ICD-10-AM (Australian Modification) and ICD-11 in both directions. The source code will be available at: https://github.com/Xujan24/wt-KL

Original languageEnglish
Title of host publicationData Science: Foundations and Applications: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, Australia, June 10-13, 2025, Proceedings, Part VII
EditorsXintao Wu, Myra Spiliopoulou, Can Wang, Vipin Kumar, Longbing Cao, Xiangmin Zhou, Guansong Pang, Joao Gama
Place of PublicationSingapore
PublisherSpringer
Pages284-295
Number of pages12
ISBN (Electronic)9789819682980
ISBN (Print)9789819682973
DOIs
Publication statusPublished - 2025
EventPacific-Asia Conference on Knowledge Discovery and Data Mining - Sydney, Australia
Duration: 10 Jun 202513 Jun 2025
Conference number: 29th

Publication series

NameLecture Notes in Computer Science
Volume15876
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining
Abbreviated titlePAKDD
Country/TerritoryAustralia
CitySydney
Period10/06/2513/06/25

Keywords

  • Data Uncertainty
  • International Classification of Disease
  • Mapping Tables
  • Probabilistic Embedding

Fingerprint

Dive into the research topics of 'Managing data uncertainty in automatic mapping of clinical classification systems'. Together they form a unique fingerprint.

Cite this