Abstract
Mapping clinical classification systems, like the International Classification of Disease (ICD) across different versions and other external clinical classifications systems, is challenging and often done manually by trained professionals. Among others, variation in the code descriptions to describe the same clinical condition in different versions poses a unique challenge to implementing automated mapping systems. We call this data uncertainty. Existing lexical-based methods attempt to solve this problem by generating alternative terms using synonyms. This work addresses the data uncertainty by learning a probabilistic embedding for each code description using similar terms and paraphrases. A valid code pair must exhibit proximity in the embedding space and have a comparable distribution. Additionally, we propose a new evaluation metric that considers the hierarchical structure of ICD to evaluate the performance of an automated mapping system. We demonstrate the effectiveness of our approach by mapping ICD-9-CM (Clinical Modification) and ICD-10-CM, ICD-10-AM (Australian Modification) and ICD-11 in both directions. The source code will be available at: https://github.com/Xujan24/wt-KL
| Original language | English |
|---|---|
| Title of host publication | Data Science: Foundations and Applications: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, Australia, June 10-13, 2025, Proceedings, Part VII |
| Editors | Xintao Wu, Myra Spiliopoulou, Can Wang, Vipin Kumar, Longbing Cao, Xiangmin Zhou, Guansong Pang, Joao Gama |
| Place of Publication | Singapore |
| Publisher | Springer |
| Pages | 284-295 |
| Number of pages | 12 |
| ISBN (Electronic) | 9789819682980 |
| ISBN (Print) | 9789819682973 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | Pacific-Asia Conference on Knowledge Discovery and Data Mining - Sydney, Australia Duration: 10 Jun 2025 → 13 Jun 2025 Conference number: 29th |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15876 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | Pacific-Asia Conference on Knowledge Discovery and Data Mining |
|---|---|
| Abbreviated title | PAKDD |
| Country/Territory | Australia |
| City | Sydney |
| Period | 10/06/25 → 13/06/25 |
Keywords
- Data Uncertainty
- International Classification of Disease
- Mapping Tables
- Probabilistic Embedding