An algorithm for augmenting cancer registry data for epidemiological research applied to oesophageal cancers

Western Sydney University thesis: Doctoral thesis

Abstract

Oesophageal cancer is an important cancer with short survival, but the relationship between pre-diagnosis health behaviour and post-diagnosis survival remains poorly understood. Cancer registries can provide a high quality census of cancer cases but do not record pre-diagnosis exposures. The aim of this thesis is to document relationships between pre-diagnosis health behaviours on post-diagnosis survival times in oesophageal cancer, developing new methods as required. A systematic review and meta-analysis conducted in 2014, and updated in 2021, to investigate the association between pre-diagnosis health behaviours and oesophageal cancer. Visualising health behaviour variables as part of the cancer registry data set, with 100% missing data, led to the development of new approaches for augmenting US oesophageal cancer registry data with health behaviour data from a US national health survey Firstly, the health survey data were used to create logistic regression models of the probability of each behaviour relative to demographic characteristics and then these models were applied to cancer cases to estimate their probability of each behaviour. Secondly, cold-deck imputation such that two randomly selected but demographically similar health survey respondents both donated their health behaviour to the matching cancer case. The agreement between these two imputed values was used as an estimate of the misclassification and corrected for during the analyses. The logistic regression imputation-based analyses returned accurate point estimates, with wide confidence intervals, if the behaviour occurred in more than approximately 5% of cases. Our reviews and analyses confirmed that pre-diagnosis smoking decreased survival in oesophageal cancer (hazard ratio (HR) 1.08, 95% confidence interval (CI) 1.00-1.17) particularly squamous cell carcinoma when comparing highest to lowest lifetime exposure ( and HR 1.55, 95%CI 1.25-1.94); with similar associations for alcohol consumption. Pre-diagnosis leisure time physical activity was found to be associated with reduced hazard (HR 0.25, 95%CI 0.03,0.81) overall. Findings from these analyses can assist in modelling the impact of current changes in community health behaviour, as well as informing prognosis and treatment decisions at the individual level. This novel method of augmenting cancer registry data with pre-diagnosis variables appears to be effective and will benefit from further validation. This thesis has significantly progressed both issues and identified future opportunities for research and development.
Date of Award2022
Original languageEnglish

Keywords

  • health behavior
  • esophagus
  • cancer

Cite this

'