A comparative analysis of selected set of natural language processing (NLP) and machine learning (ML) algorithms for clinical coding using clinical classification standards

  • Rajvir Kaur

Western Sydney University thesis: Master's thesis

Abstract

In Australia, hospital discharge summaries created at the end of an episode of care contain patient information such as demographic data, medical history, various diagnosis, interventions carried out, medications and drug therapies provided to the patient. These discharge summaries not only serve as a record of the episode of care, but later converted into a set of clinical codes for statistical analysis purposes. The process of clinical coding refers to assigning alphanumeric codes to discharge summaries. In Australia, clinical coding is done using International Classification of Diseases, version 10, Australian Modification (ICD-10-AM) and Australian Classification of Health Interventions (ACHI) as per the Australian Coding Standards (ACS), in an acute and subacute care setting, in both public and private hospitals. Clinical coding and subsequent analysis facilitate funding, insurance claims processing and research. The task of assigning codes to an episode of care is a manual process. This posed challenges in terms of ever-increasing set of codes in ICD-10-AM and ACHI, changing coding standards in ACS, complexity of care episodes, and large training and recruitment costs associated with clinical coders. In addition, the manual clinical coding process is time consuming and prone to errors, leading to financial losses. The use of Natural Language Processing (NLP) and Machine Learning (ML) techniques is considered as a solution to the above problem. In this thesis, four different approaches namely, pattern matching, rule based, machine learning and hybrid technique are compared to identify most efficient algorithm suitable for clinical coding. The ICD-10-AM and ACHI consists of 22 chapters based on human body organs, where each chapter describe diseases and interventions of a body system. The aforementioned, NLP and ML comparison is carried out only two chapters namely, diseases of the respiratory system and diseases of the digestive system. Initially, the dataset contained 190 clinical records of two chapters and named as Data190. Due to the limited number of clinical records, another 45 records were added to the existing dataset and this resultant dataset was named as Data235. The clinical records were cleaned up in the pre-processing stage to extract useful information which includes principal diagnosis, additional diagnosis, diabetes condition, principal procedure, additional procedure and anaesthesia details. In data pre-processing, various NLP techniques such as tokenisation, stop word removal, spelling error detection and correction, negation detection and abbreviation expansion were applied. In pattern matching approach, the textstring were matched charcter by character against the ICD-10-AMand ACHI coding guide using regular expression. If the match was found, codes were assigned. Whereas, in rule-based, 409 rules were defined to avoid coding of wrong patterns. In machine learning, once the unwanted information was removed from the clinical records, text was represented in vector form for feature extraction using Bag of words (BoW) representation (Manning, Raghavan, & Schütze, 2008, p. 117) and Term Frequency-Inverse Document Frequency (TF-IDF) vectoriser (Manning et al., 2008, p. 118). After feature extraction, classification is done using seven classifiers namely Support Vector Machine (SVM) (Cortes & Vapnik, 1995), Na ve Bayes (Manning et al., 2008, p. 258), Decision Tree (Kumar, Assistant, & Sahni, 2011), Random Forest (Breiman, 2001), AdaBoost (Freund & Schapire, 1999), Multi Layer Perceptron (MLP) (Naraei, Abhari, & Sadeghian, 2016) and k-Nearest Neighbour (kNN) (Manning et al., 2008, p. 297). A set of standard metrics: Precision(P), Recall (R), F-score (F-score), Accuracy, Hamming Loss(HL) and Jaccard Similarity (JS) (Dalianis, 2018), (Aldrees & Chikh, 2016) is used to do the measure the efficiency of the said NLP and ML algorithms using the above mentioned two datasets. For both the datasets (Data190 and Data235), the machine learning approach and the hybrid approach gave good performances in comparison to pattern matching and rule-based approach. Among all the classifiers, AdaBoost outperformed followed by Decision Tree and other classifiers. In the machine learning approach, Decision Tree technique performed better than all the other classifiers using 4-gram feature set by achieving 0.87 F-score, 0.7453 JS and 0.0877 HL. Similarly, in Data235, AdaBoost outperforms by achieving 0.91 F-score, 0.8294 JS and 0.0945 HL.
Date of Award2018
Original languageEnglish

Keywords

  • diagnosis related groups
  • evaluation
  • nosology
  • data processing
  • code numbers
  • algorithms
  • machine learning
  • natural language processing (computer science)
  • medical records
  • Australia

Cite this

'