Inter-reader Variability in the Use of BI-RADS Descriptors for Suspicious Findings on Diagnostic Mammography: A Multi-institution Study of 10 Academic Radiologists

Amie Y. Lee, Dorota J. Wisner, Shadi Aminololama-Shakeri, Vignesh A. Arasu, Stephen A. Feig, Jonathan B Hargreaves, Haydee Ojeda-Fournier, Lawrence W. Bassett, Colin J. Wells, Jade De Guzman, Chris I. Flowers, Joan E. Campbell, Sarah L. Elson, Hanna Retallack, Bonnie N. Joe

Research output: Contribution to journalArticlepeer-review

30 Scopus citations


Rationale and Objectives The study aimed to determine the inter-observer agreement among academic breast radiologists when using the Breast Imaging Reporting and Data System (BI-RADS) lesion descriptors for suspicious findings on diagnostic mammography. Materials and Methods Ten experienced academic breast radiologists across five medical centers independently reviewed 250 de-identified diagnostic mammographic cases that were previously assessed as BI-RADS 4 or 5 with subsequent pathologic diagnosis by percutaneous or surgical biopsy. Each radiologist assessed the presence of the following suspicious mammographic findings: mass, asymmetry (one view), focal asymmetry (two views), architectural distortion, and calcifications. For any identified calcifications, the radiologist also described the morphology and distribution. Inter-observer agreement was determined with Fleiss kappa statistic. Agreement was also calculated by years of experience. Results Of the 250 lesions, 156 (62%) were benign and 94 (38%) were malignant. Agreement among the 10 readers was strongest for recognizing the presence of calcifications (k = 0.82). There was substantial agreement among the readers for the identification of a mass (k = 0.67), whereas agreement was fair for the presence of a focal asymmetry (k = 0.21) or architectural distortion (k = 0.28). Agreement for asymmetries (one view) was slight (k = 0.09). Among the categories of calcification morphology and distribution, reader agreement was moderate (k = 0.51 and k = 0.60, respectively). Readers with more experience (10 or more years in clinical practice) did not demonstrate higher levels of agreement compared to those with less experience. Conclusions Strength of agreement varies widely for different types of mammographic findings, even among dedicated academic breast radiologists. More subtle findings such as asymmetries and architectural distortion demonstrated the weakest agreement. Studies that seek to evaluate the predictive value of certain mammographic features for malignancy should take into consideration the inherent interpretive variability for these findings.

Original languageEnglish (US)
Pages (from-to)60-66
Number of pages7
JournalAcademic Radiology
Issue number1
StatePublished - Jan 1 2017


  • Breast Imaging
  • Mammography

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'Inter-reader Variability in the Use of BI-RADS Descriptors for Suspicious Findings on Diagnostic Mammography: A Multi-institution Study of 10 Academic Radiologists'. Together they form a unique fingerprint.

Cite this