TY - JOUR
T1 - Benchmarking the diagnostic test accuracy of certified AI products for screening pulmonary tuberculosis in digital chest radiographs : preliminary evidence from a rapid review and meta-analysis
AU - Hua, D.
AU - Nguyen, K.
AU - Petrina, N.
AU - Young, Noel
AU - Cho, J. G.
AU - Yap, A.
AU - Poon, S. K.
PY - 2023/9
Y1 - 2023/9
N2 - Background and objective: The global market for AI systems used in lung tuberculosis (TB) detection has expanded significantly in recent years. Verifying their performance across diverse settings is crucial before medical organisations can invest in them and pursue safe, wide-scale deployment. The goal of this research was to synthesise the clinical evidence for the diagnostic accuracy of certified AI products designed for screening TB in chest X-rays (CXRs) compared to a microbiological reference standard. Methods: Four databases were searched between June to September 2022. Data concerning study methodology, system characteristics, and diagnostic accuracy metrics was extracted and summarised. Study bias was evaluated using QUADAS-2 and by examining sources of funding. Forest plots for diagnostic odds ratio (DOR) and summary receiver operating characteristic (SROC) curves were constructed for the AI products individually and collectively. Results: 10 out of 3642 studies satisfied the review criteria however only 8 were subject to meta-analysis following bias assessment. Three AI products were evaluated with a 95 % confidence interval producing the following pooled estimates for accuracy rankings: qXR v2 (sensitivity of 0.944 [0.887–0.973], specificity of 0.692 [0.549–0.805], DOR of 3.63 [3.17–4.09], Lunit INSIGHT CXR v3.1 (sensitivity of 0.853 [0.787–0.901], specificity of 0.646 [0.627–0.665], DOR of 2.37 [1.96–2.78]), and CAD4TB v3.07 (sensitivity of 0.917 [0.848–0.956], specificity of 0.371 [0.336–0.408], DOR of 1.91 [1.4–2.47]). Overall, the products had a sensitivity of 0.903 (0.859–0.934), specificity of 0.526 (0.409–0.641), and DOR of 2.31 (1.78–2.84). Conclusion: Current publicly available evidence indicates considerable variability in the diagnostic accuracy of available AI products although overall they have high sensitivity and modest specificity which is improving with time. These preliminary results are limited by the small number of studies and poor coverage for low TB burden settings. More research is needed to expand the clinical evidence base for the performance of AI products.
AB - Background and objective: The global market for AI systems used in lung tuberculosis (TB) detection has expanded significantly in recent years. Verifying their performance across diverse settings is crucial before medical organisations can invest in them and pursue safe, wide-scale deployment. The goal of this research was to synthesise the clinical evidence for the diagnostic accuracy of certified AI products designed for screening TB in chest X-rays (CXRs) compared to a microbiological reference standard. Methods: Four databases were searched between June to September 2022. Data concerning study methodology, system characteristics, and diagnostic accuracy metrics was extracted and summarised. Study bias was evaluated using QUADAS-2 and by examining sources of funding. Forest plots for diagnostic odds ratio (DOR) and summary receiver operating characteristic (SROC) curves were constructed for the AI products individually and collectively. Results: 10 out of 3642 studies satisfied the review criteria however only 8 were subject to meta-analysis following bias assessment. Three AI products were evaluated with a 95 % confidence interval producing the following pooled estimates for accuracy rankings: qXR v2 (sensitivity of 0.944 [0.887–0.973], specificity of 0.692 [0.549–0.805], DOR of 3.63 [3.17–4.09], Lunit INSIGHT CXR v3.1 (sensitivity of 0.853 [0.787–0.901], specificity of 0.646 [0.627–0.665], DOR of 2.37 [1.96–2.78]), and CAD4TB v3.07 (sensitivity of 0.917 [0.848–0.956], specificity of 0.371 [0.336–0.408], DOR of 1.91 [1.4–2.47]). Overall, the products had a sensitivity of 0.903 (0.859–0.934), specificity of 0.526 (0.409–0.641), and DOR of 2.31 (1.78–2.84). Conclusion: Current publicly available evidence indicates considerable variability in the diagnostic accuracy of available AI products although overall they have high sensitivity and modest specificity which is improving with time. These preliminary results are limited by the small number of studies and poor coverage for low TB burden settings. More research is needed to expand the clinical evidence base for the performance of AI products.
UR - https://hdl.handle.net/1959.7/uws:76542
U2 - 10.1016/j.ijmedinf.2023.105159
DO - 10.1016/j.ijmedinf.2023.105159
M3 - Article
SN - 1386-5056
VL - 177
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 105159
ER -