TY - JOUR
T1 - Comparison of the validity and reliability of two image classification systems for the assessment of mammogram quality
AU - Moreira, Conrad
AU - Svoboda, Kate
AU - Poulos, Ann
AU - Taylor, Richard
AU - Page, Andrew
AU - Rickard, Mary
PY - 2005
Y1 - 2005
N2 - Objective: To compare the reliability and validity of two classification systems used to evaluate the quality of mammograms: PGMI ('perfect', 'good', 'moderate' and 'inadequate') and EAR ('excellent', 'acceptable' and 'repeat'). Setting: New South Wales (Australia) population-based mammography screening programme (BreastScreen NSW). Methods: Thirty sets of mammograms were rated by 21 radiographers and an expert panel. PGMI and EAR criteria were used to assign ratings to the medio-lateral oblique (MLO) and cranio-caudal (CC) views for each set of films. Inter-observer reliability and criterion validity (compared with expert panel ratings) were assessed using mean weighted observed agreement and kappa statistics. Results: Reliability. Kappa values for both classification systems were low (0.01-0.17). PGMI produced significantly higher values than EAR. Agreement between raters was higher using PGMI than EAR for the MLO view (77% versus 74%, P<0.05), but was similar for the CC view. Dichotomized ratings ('acceptable' or 'needs repeating') did not improve reliability estimates. Validity. Kappa values between raters and the reference standard were low for both classification systems (0.05-0.15). Agreement between raters and the reference standard was higher using PGMI than EAR for the MLO view (74% versus 63%), but was similar for the CC view. Dichotomized ratings of the MLO view showed slightly higher observer agreement. Conclusions: Both PGMI and EAR have poor reliability and validity in evaluating mammogram quality. EAR is not a suitable alternative to PGMI, which must be improved if it is to be useful.
AB - Objective: To compare the reliability and validity of two classification systems used to evaluate the quality of mammograms: PGMI ('perfect', 'good', 'moderate' and 'inadequate') and EAR ('excellent', 'acceptable' and 'repeat'). Setting: New South Wales (Australia) population-based mammography screening programme (BreastScreen NSW). Methods: Thirty sets of mammograms were rated by 21 radiographers and an expert panel. PGMI and EAR criteria were used to assign ratings to the medio-lateral oblique (MLO) and cranio-caudal (CC) views for each set of films. Inter-observer reliability and criterion validity (compared with expert panel ratings) were assessed using mean weighted observed agreement and kappa statistics. Results: Reliability. Kappa values for both classification systems were low (0.01-0.17). PGMI produced significantly higher values than EAR. Agreement between raters was higher using PGMI than EAR for the MLO view (77% versus 74%, P<0.05), but was similar for the CC view. Dichotomized ratings ('acceptable' or 'needs repeating') did not improve reliability estimates. Validity. Kappa values between raters and the reference standard were low for both classification systems (0.05-0.15). Agreement between raters and the reference standard was higher using PGMI than EAR for the MLO view (74% versus 63%), but was similar for the CC view. Dichotomized ratings of the MLO view showed slightly higher observer agreement. Conclusions: Both PGMI and EAR have poor reliability and validity in evaluating mammogram quality. EAR is not a suitable alternative to PGMI, which must be improved if it is to be useful.
UR - http://www.scopus.com/inward/record.url?scp=15944405051&partnerID=8YFLogxK
U2 - 10.1258/0969141053279149
DO - 10.1258/0969141053279149
M3 - Article
C2 - 15814018
AN - SCOPUS:15944405051
SN - 0969-1413
VL - 12
SP - 38
EP - 42
JO - Journal of Medical Screening
JF - Journal of Medical Screening
IS - 1
ER -